File System Access API – 16 September 2022

Meeting minutes

kleber: [slide deck] : https://docs.google.com/presentation/d/1QQgrm4oaRRRBr1gfvKj7D8rS2EW8kRgRUHPscvR8BNo/edit#slide=id.g15545e7b627_0_127
… I would like to talk through what is same and different between FLEDGE and PARAKEET
… shared goal: support ad targeting via multiparty process
… ad targeting based on contextual, ad-specific, user-specific information

<npd> "just like the world we live in today" is a strange goal statement

kleber: no ability to recognize user across multiple websites.
… user-specific info all stored on-device
… use case of app-ads is one of our goals, so may be stored "by OS"
… shared JS API to support these goals

<AramZS> Custom Audience is def the standard term for the ad tech/buyer side understanding of this if we're up for a name change.

kleber: willingness to support Privacy-enhancing technologies
… mechanism for rendering chosen ads (fenced frames)
… mechanism for reporting auction outcomes
… (Aggregate Reporting API)
… questions before I go on?
… so, what's different:
… 1) what "servers equipped with Privacy-Enhancing Tech" can do
… 2) How much computation needs to be done on-device
… 3) what scope of user-specific info can be used in ad selection
… these 3 things are heavily interrelated; could look at this number differently
… let's go through each of these three things during this talk.
… 1) what PET servers can do. both proposals have changed over time; we once thought this could be entirely on-device
… adtech folks pointed out realtime data is actually key (e,g., if ad budget has run out!)
… or keeping ads from running on sites they find objectionable
… if a browser is going to touch a server, how do we protect privacy?
… in OG FLEDGE, servers could only be implemented in privacy-preserving ways

<npd> I'm not convinced that real time data is essential, but I can see that there is a preference for it. we could work on both budget and safety questions through other means if there was interest

kleber: OG PARAKEET relied on browser-run servers to see but not store data; would proxy a noisy version to adtech servers

joel: yes. proxy would sufficiently anonymize

kleber: right. Neither of these stories turned out to be realistic.
… at scale
… where we are today: on FLEDGE, browsers can contact servers but only on verifiable builds in Trusted exec environment, still can't store data or user profile, but okay because verifiable.

<dmarti> npd, it has to be "near" real time (for the unsafe ad placement problem, probably a few hours would work)

kleber: on PARAKEET, noisy ranking, must have substantial noising and response caching
… both rely on privacy enhanching technology servers.
… on how much computation needs to be done on-device: P has always said little is done on-device. F has changed a lot more; used to be very on-device-heavy.

<npd> dmarti, I'm glad to hear there is renewed industry interest in reducing the harms of ads on unsafe sites. it feels a little suspicious that this problem has newly come up in the context of why we can't adopt a privacy-friendly system.

kleber: now have sandboxed code execution inside server.
… maybe: DSP bidding and SSP auction may move to TEEs?
… this would help on devices such as phones

<npd> we could try to explore other mechanisms for blocking dangerous ads or blocking ads from dangerous sites. safebrowsing, e.g. has update mechanisms

kleber: finally, gap #3: scope of user-specific information in ad selection
… P: on-device ad targeting profile is based on web-wide data
… F: bidding of each interest group (custom audience) is based only on data from a single site
… so ad-targeting/machine-learning models cannot learn new information.
… this is fundamentally more conservative, and certainly more limited that what goes on today.
… there's a fundamental question here: is it ok to build a cross-site model?

<npd> it sounds like both are using cross-site data, but Parakeet would be combining data from multiple sites, whereas Fledge would be crossing just from one site to another

eriktaubeneck: (clarification)

<dmarti> npd, thank you -- we have rapidly pausing advertising in general as an advertiser requirement but this needs to be clearer to include rapidly pulling ads from a specific set of problem sites https://github.com/w3c/web-advertising/blob/bdd3224672cde1bb8543ddec798a6ca69ac61a4a/support_for_advertising_use_cases.md#pausing-advertising

aram: wondering on ? of DSP operations: has there been a lot of feedbakc from DSPs and SSPs? Seems like they might be unwilling to move to a different domain they have less control over.

kleber: we have heard that. I will say: you're right, the bidding operation may contain their company's secret sauce, so they might not want it running in-browser, where i might be reverse-engineered. Not everyone agrees about this.
… some privacy advocates say putting in the public is better.

joel: the big thing adtech doesn't want to do is reveal bids to users, since it would affect the auction.
… the bids themselves can't be revealed in the clear, but lockign them in th TEE is likely okay.

aram: the publisher typically plays the bidders against each other. publishers want to monitor the bidding in realtime.

joel: I think we can do that in the TEE too.
… you should be able to run a more fair auction.

aram: sounds fine, though that doesn't seem to be in the model here.

kleber: this is a difference between the transparency necessary when shipping to the device vs in TEE; but the on-device bidding does occur in worklets that are not allowed to leak out to the page.

aram: publishers could deploy those workletst?

kleber: sure.
… in FLEDGE version of TEEs, there's no difference in capabilities between on-device and shipped to the cloud.

benjamin: in isolated worklet, what is the flexibility you have in mind? Originally you could write arbitrary code, but now this is DSL or custom logic that has restrictions?

kleber: in FLEDGE, it's arbitrary code execution. This gets to a bunch of issues that makes shipping to TEE in cloud appealing.
… as Martin pointed out in talk on Tuesday, keeping info isolated on a phone can be quite difficult.
… there are tradeoffs in both directions.

<Zakim> npd, you wanted to ask what happens if users click on ads

npd: one privacy concern we might have is about the use of information, particularly when it is off-device

<AramZS> (worth noting that the on-device process has also faced some trust issues between the bidders, publishers and ad tech vs the device system owners, either the OS or Browser folks)

npd: concerned about limiting disclosure to when you click on ads
… I don't think I usually click on ads.

kleber: in F, model is every ad that's going to be shown, rendering URL of ad needs to be sent to device.
… cannot be influenced by info stored locally (on-device user profile)
… more than that, subject to a k-minimum check
… explicitly to try to prevent info leakage at time of click
… essentially, there's no more info transfer at click time than if the user clicked on a link from the site they were on to the ad site.

npd: I think you're saying all the info is transferred aback to the ad site?

kleber: no, only to k-anonymous levels.
… no user-specific data.

npd: has to be selected to be shown, but not clicked?

kleber: yes.

npd: in both cases, it seems all the ad data is revealed when you click

kleber: the fact that you were shown an ad is certainly revealed, yes.
… but thas isn't the same question is "can you build a cross-site model"

npd: could simply not share this

kleber: that would remove significant utility from this. information disclosure; it's about leakage over time.

<npd> npd: could mitigate disclosure risks on click by these selected/targeted ads not being clickable

<Zakim> theowarren, you wanted to ask about potential user control differences between FLEDGE & Parakeet targeting

theo: on point 3: the difference between F and P's models. F's model registers you in an interest group. one of failure modes of targeting is when you've already bought something.
… P seems to not have the same kind of affordance.

joel: there's no signal that says "Stop showing this".
… parakeet would like to be able to give the signal to the DSP "don't show this ad"
… not in current P spec

kleber: but in the P model you don't know who "this user" is - you could only do this to a cluster of users

<npd> I would love to see the more detailed proposal about the functionality of users choosing not to see this ad any more

kleber: takeawy theo points on: in F model, there's an answer to "why did I see this ad", in P model, the mixing of data makes that more difficult.

<AramZS> npd I think it is likely the ad interaction more common than clicking the ad intentionally lol. Just the current system of ad tech makes it so these processes don't work

russ: re: #3 - a publisher knows what the context on the page is, and who the user is directly.
… can't they just show ads directly?

kleber: of course, first-party ad placement is totally ok; if you have that, you don't need this mechanism. Although having on-device storage, even for first-party ads, this enables features like cross-site frequency capping might be useful.

<npd> AramZS (IRC), totally, and I think it's a promising set of functionality to make work for user privacy interests, and an improvement over the status quo (where it does seem like that functionality doesn't work)

<Zakim> AramZS, you wanted to ask about advantages to moving off device for DAI?

aram: the android mention raises an interesting point - dynamic ad insertion - but we probably don't have time to get into this deeply.
… I'm wondering if either of these proposals could support those scenarios - e.g. podcast ads

kleber: when there's a cloud-based TEE, yes, seems like it could. still have problem of what fenced frames is trying to solve, preventing site from getting access to user profile

benjamin: how would you train an ML model in this case?

kleber: if we put all this together, we'll need to figure out this answer.

joel: [has an answer the scribe can't capture that quickly :P]

<Zakim> jyasskin, you wanted to suggest that cross-site on-device profiles are probably not ok because it gets leaked on click.

jeffrey: suggestion of an answer to #3: based on nick's observation, a link click would lead to some ad data,

kleber: no, just that the ad was selected

jeffrey: but some data is revealed. so it seems like there needs to be some limit on the number of ads

kleber: yes, though that just slows down the rate of leakage

jeffrey: is this documented?'

kleber: not yet.

<npd> it seems like it could be small but very revealing pieces of information. not just that it's uniquely identifying, but the properties themselves (that the user visited plannedparenthood.org etc.)

link to slides: https://docs.google.com/presentation/d/1QQgrm4oaRRRBr1gfvKj7D8rS2EW8kRgRUHPscvR8BNo/edit#slide=id.g15545e7b627_0_127

https://goo.gle/fps-meeting-notes

<here> oh, wow, that worked

<AramZS> It is interesting to me that this has introduced a whole new idea:

<AramZS> "A multi domain site"

<npd> I don't want to delay the conversation, but as I've noted on some github issues, I don't see how these changes would benefit users

<kleber> AramZS: On the contrary, I think "a multi-domain site" was a thing that already existed, and until 3p cookie removal nobody needed to give it a special name, it was just a website

those would both be great comments to be on the record (i.e. Nick, you should ask on the queue)

<kleber> I guess some people might have talked about "I want to search for this on US Google, not on France Google" in a very small set of situations, but it has been a multi-domain site all along

<AramZS> Kleber: I don't think even most users of 3p would think of it that way. I agree that maybe a few people doing multi-national-TLD maybe, but a lot of people who used 3p for this type of stuff we're not so much thinking of a "multi-domain site" as they are thinking about a "multi-site user".

<npd> kleber, indeed, the whole point of having different domain names for different countries may have been explicitly indicating to users that these were different contexts, with different languages or policies

<AramZS> Queue is on the google doc dmarti

<eriktaubeneck_> (added you to the queue on the Google doc dmarti)

<AramZS> Outside of country code variants domains don't really think of themselves as multi-domain sites and users def don't with a very few very popular exceptions.

<AramZS> Google doc is here: https://docs.google.com/document/d/10dMVqt2x8otohdJx4AZYccdX5Hp_lrxlT7QAO2adcfE/edit#

<martinthomson> I just saw a comment on the Zoom from Tim. Can someone remind people not to do that?

<AramZS> It is particularly fascinating to establish this as a term because advertising usually renders domains as wanting to be intentionally different. Unbundling of sites to multiple domains was specifically done in order to avoid the idea of thinking of themselves as "united" in order to have separate targeting.

<npd> my notes on gov.uk: https://github.com/WICG/first-party-sets/issues/102

<martinthomson> removing the eTLD thing is a good start. removing all of the subsets would be better.

<martinthomson> +1 to what Nick is saying here

<AramZS> I still have yet to see an iteration of this proposal that is not better solved by going out in the marketplace and spending the time and money to tell site-owners to merge their domains and I'm not sure these use cases are going in the right direction on that.

<npd> is the purpose improving user prompts? or is it about the user expecting to combine their data between all these sites?

<npd> perhaps I'm just still not getting it, but it feels like "associated domains" is clearly different sites that will combine data for user-hostile reasons, but hey, the harm is limited to 3 domains at a time (for now)

<AramZS> Even worse the sets are, by design, intended to have different privacy models/tos than the rest of the set as far as I understand?

<AramZS> Seems real bad.

<AramZS> Like we should not be allowing systems that do not share the same understanding of privacy or the user's data and where it should go to share that users data, right?

<eriktaubeneck_> for what it's worth, that can happen currently with different paths on the same domain now

<johannhof> Would you be okay with what you said if those sites passed a storage access prompt?

<AramZS> Sure but that risk grows when we increase security cross domain and then use this as a tool to lower that security.

<AramZS> If the web makes a new promise of privacy to the user and then goes 'oh no, *wave hands* except over here' then our capacity to gain and retain user trust with these measures is injured.

<eriktaubeneck_> +1

<martinthomson> Do we have time in this agenda for discussions about governance?

<eriktaubeneck_> martinthomson worth hoping on the queue

<martinthomson> I keep hearing "maintain status quo" in discussions this week. We should really have a long discussion about whether that is something we want.

<martinthomson> Because I'm fairly sure that I don't want that.

<AramZS> Yes. Same

<johannhof> "Keep websites working"?

<johannhof> Certainly something I want

<npd> is the purpose to meet user expectations, or to improve prompts, or is to prevent website breakage?

<johannhof> "Status quo" obviously has to be understood in context here. We all agree things need to change (we could just not do Privacy Sandbox instead).

<johannhof> Helen meant maintaining the status quo as a contrast to companies updating their privacy policies / branding in response to a "policy" that may change in the future.

<AramZS> I don't think the status quo anyone is worried about here is 'websites load and are readable and users can interact with them' there is a very different status quo under discussion.

<johannhof> npd (IRC): don't understand how those are competing requirements

Blindly maintaining the status quo, as martinthompson suggests, is not necessarily a goal. Ensuring the web continues to be a viable platform for the world, yes.

<sarahheimlich> Notes doc: please sign yourself in: https://docs.google.com/document/d/1RefoawfEnLkGLzGp_zzyy3PJuYI7EtzY5jYTfJo_3c0/edit#

<martinthomson> It's really nice to hear someone say that about video permissions prompts. We've been saying that for years.

<martinthomson> "none of them can egress the data from your machine" is a strong claim

<martinthomson> what evidence do you have in support of that claim?

<kleber> ah I see the queue is not being handled in irc :-)

<martinthomson> so far, I have only gotten a link that I haven't read: https://cseweb.ucsd.edu/~dstefan/pubs/stefan:2012:addressing.pdf

<kleber> cwilso: I don't know how IRC note-taking works with many sessions all back-to-back like this — is there any way to make an edit to the notes from the FLEDGE/PARAKEET session?

<kleber> Those notes say "npd: could simply not share this", but what Nick actually said was "could simply not allow clicks on the ads"

<kleber> npd: fyi

<kleber> CAPTCHA-to-fingerprint pipeline

<martinthomson> I just want to have it on the record that we don't - as a general rule - care about sites consuming CPU

<charlieharrison> I am nervous that browsers would impose policies on these UIs that are so strict (e.g. in the fontpicker case) that innovation is either too difficult or simple enough that we could extend the existing APIs

<charlieharrison> e.g. some ranking function

laka: we have a spec implemented in extension and native mobile browser and agreement from Chromium to allow us to implement experimentally

laka: currently in i2p after a lot of feedback

laka: extension has usage from thousands of sites

laka: time to start implementing natively

laka: not sure whether to move forward with webkit prototype, chromium prototype, or breaking up extension. Which is best for moving this from community group to WG?

yoav: these seem like orthogonal concerns

yoav: a standard is a deliverable of a WG. Before the spec becomes a standard, one of the things you need is multiple implementations. But creating the WG doesn't require an implementation done, and writing an impl into Chromium doesn't require a standard or WG

yoav: these processes are not necessarily linked.

jyasskin: but for mozilla and (sometimes) safari need working groups

jyasskin: btw what is a provider

laka: provider pays a page on behalf of user.

laka: spec is written to allow various possible relationships, e.g. a third party provider that has a relationship with the payer and the payee, or a case where the provider is the payer i.e. user

laka: may be an edge case, but as a power user I want to be the provider

laka: back to wg - what happens after we put together a charter

jyasskin: socialize it to w3c staff

laka: we have worked with Ian

laka: shopped it around with Google and they probably won't object but not sure if they will abstain

laka: Moz has been hardest to pin down an opinion

laka: apple and chrome see this as complementary to paymentrequest

jyasskin: doesn't seem like it should be objectionable to Mozilla

laka: we're not sure who's going to fill the gap Peter left when he left Mozilla

laka: ad people might object because it can be a way to replace ads

aaronzs: most common use case is actually both, pay + show adds

aaronzs: most publishers have had a way to accept payments and/or show ads so it shouldn't be seen as a new threat to advertisers

aramzs: most common use case is actually both, pay + show adds. most publishers have had a way to accept payments and/or show ads so it shouldn't be seen as a new threat to advertisers.

aramzs: publishers will like it and advertising networks won't be bothered by it.

jyasskin: why new WG instead of adding to web payments?

laka: because they mostly focus on a page asking for money/single transactions

laka: this API is for passive ways to pay. No interaction should be required unless it's user-initiated.

jyasskin: other tipping providers might object as potential competitors (patreon etc) but they're probably not members

laka: io is a member

jyasskin: don't need to assume competitors will object

yoav: competitors may like seeing the market grow, also competitors have to have a real reason to object

<jyasskin> https://www.w3.org/groups/wg/payments/ipr <- who'd care about adding this to Web Payments

aramzs: horizontal review - should be concerned about privacy, a11y groups' opinions

aramzs: at least a preliminary look should be solicited

jyasskin: on technical aspects of WG design --- in a CG, any contributor gives patent considerations. In a WG, any member of the group gives patent considerations. One reason to get into web payments is to be sheltered by their patent umbrella

laka: not familiar with patent considerations

laka: interledger has no patents, only a trademark on the name interledger. So does anything in this spec require patent protection?

jyasskin: you should talk to a lawyer before making such claims

laka: back to next steps. What's the most valuable next step to get this into the standard, implementation?

jyasskin: get more users. That motivates implementors.

aramzs: you may not need a specific number of users. But testimonials from users may help.

yoav: wicg gives you contribution guarantees that allow people to contribute to the spec. It's not a traditional community --- it's an umbrella community group. It would be ideal to have a web monetization CG to show that there are many parties beyond interledger who wish to see this in the wild

yoav: there may be one bar for landing code in Chromium, and a higher bar for actually shipping. The risk of the feature being shipped, going unused, being deprecated and removed must be weighed against potential benefits. So showing users and providers and thriving ecosystem exists is key.

laka: this makes me think it's more important to break up the extension than try to implement in chromium. It's going to be easier to keep adding users via extension rather than getting users to flip an about:flags entry

yoav: you could do both simultaneously if you had the resources

laka: probably only have resources for one or the other

jyasskin: proving that it's a good idea via growing the user base seems like the top priority

laka: coil (provider) doesn't make much money from this, interledger makes none (tis a foundation)

laka: coil has other income streams such as Twitch integration

jyasskin: that's another reason to focus on users first.

laka: who's interested in continuing to participate?

aramzs: I am

aramzs: let's connect on slack. https://www.w3.org/wiki/Slack

npd: this is great because it enables alternative business models on the web

laka: thank you for the help and constructive feedback

<npd> npd: potential for decentralization, funding for sites, and possibilities like better privacy support in different business models

laka: please refer to spec on github for feedback going forward

laka: how does WICG feel about continuing to host the spec in WICG / webmonetization

laka: as we move to a WG

yoav: probably best to move it to w3c

aramzs: if you do the repo move correctly, github will automatically redirect the old URL

laka: so long and thanks

<yoav> zakin, end meeting

WICG File System, File System Access

Files on the web are complicated - File API (blobs, files, filereader). After - file AIP - directories & system API, sandboxed file system, now origin private file system (wednesday topic)

synchronous file system, not a good idea

Then directories upload - not standardized, issues

<martinthomson> Only 27 red marks from respec

Directories & systems API -> moved to FileSystem API by Mozilla

also not standardized

Then File & Directories API -> gave us drag& drop, not w3c, not official, but implemented by all browsers kinda

confusing!

<martinthomson> coincidentally, 3 red marks on one, times 9 red marks on the other, equals 27 red marks on the next

presentation link: https://docs.google.com/presentation/d/1r48bKVJze8E9rHhWG6Ioeyiri0kEuBQRTF6WaCwuiLs/edit?resourcekey=0-pYZw4AzM68ARpZMRjMDUuQ#slide=id.p

<jesup__> @martinthomson Coincidence? I think not... ;-)

So we have pieces of File API and File & Directories Entries API. New -> File System API which supports origin private file system

Where do we go from here?

File System Access API - enable power web apps that allow interaction with files on user's device

simplifying use-cases - seems to unify old file system API functionality into this new API (see all of the old specs / ways to read & acquire files, this unifies them)

FileSystemHandle - modern, promise-based interface, modern

OPFS (origin private file system) split off into WHATWG spec

and part is in WICG

ideally we can unify to WHATWG

See slides for all use-cases now solved by File System Access API

Use-cases: https://docs.google.com/presentation/d/1r48bKVJze8E9rHhWG6Ioeyiri0kEuBQRTF6WaCwuiLs/edit?resourcekey=0-pYZw4AzM68ARpZMRjMDUuQ#slide=id.g1557d7d31ce_0_170

capability summary: https://docs.google.com/presentation/d/1r48bKVJze8E9rHhWG6Ioeyiri0kEuBQRTF6WaCwuiLs/edit?resourcekey=0-pYZw4AzM68ARpZMRjMDUuQ#slide=id.g1557d7d31ce_0_123

editing, enumerating, async interfaces. /streams, persisting handles to IDB, moves, change events, async alternative to SyncAccessHandles. WASM things around sync file access & maybe add async access to these types of things from main thread

talking about how to support in-place writes and safe browsing checks at the same time. Copy-on-write for being able to write to a swap file before swapping to the final file written on `.close()`

Use cases! VSCode. IDEs. Office applications (spreadsheet apps, slideshow apps). Graphics editors & drawing tools. Video game level editor tools

How do we do this in a way that protects user privacy & security? Having a handle to a file does NOT mean having access to a file. UA controls this, and can decide between RW, Read-only, and give user control about what they are giving site access to. E.g. expiring permissions, transparency, revoking, etc

API has been in incubation for a while, first version a couple years ago. Stuff things being hammered out - remove(), move(), unique id (to help de-dup files)

Trying to make the API more usable, prevent permission fatigue.

Ensuring OPFS can co-exist with local file system

New capabilities - file system change events. Better support for long-running file operations (large download can be exposed to user, user can cancel, etc)

End of presentation

sauski: When giving directory access, this is for entire folder tree?

<martinthomson> and symlinked files? and hardlinked files?

asully: yes

christianliebel: Want to stress important of API, thank you. For use-cases - there are also use-cases where you can not or must not exfiltrate data. One case, models are so big you cannot access through internet, or user data so big that you cannot load them off the file system. The API was very handy here to access & use this large data. This is the most important API that is currently not supported on all platforms. If it would be, then it

would be a huge leap forward for web applications.

EricAnderson: What is the thinking on persistent permission? With installed PWAs? Slide not having a high-performance cross-site data channel, is that related?

<martinthomson> ErikAnderson: like /etc ?

EricAnderson: I've also had some website indicating they want to do stuff with a well-known directory like a user's documents folder. The site wants to indicate the 'well known directory' it is interested in, where the user wouldn't need to be presented with a full file prompt?

<martinthomson> or ~/.ssh

<ErikAnderson> https://wicg.github.io/file-system-access/#enumdef-wellknowndirectory

<ErikAnderson> Like "documents" as defined there

asully: Biggest request is for persisted permissions. Right now, after you get the file & refresh page, that permission is gone. the ACL is gone when the browsing context goes away. The UA can decide when these persist. When you refresh, maybe you don't reset permissions, but maybe 1 month from now, the access might go away. We want access to match user's intent.

<martinthomson> oh, that's probably better, though I don't know that I think it is entirely safe

<tomayac> Dangerous folders are blocked: https://wicg.github.io/file-system-access/#wellknowndirectory-too-sensitive-or-dangerous

asully: we have definitely explored things like using signals like installation to indicate persistence with that site.

<martinthomson> tomayac: seems like a job for life right there

<ErikAnderson> Yes, it terrifies me a bit. I think it would need to probably be an installed app or otherwise having some much higher trust indicator.

<martinthomson> ErikAnderson: yeah, once you have "installed" something, I think we can do a lot more

<benmorss_> We've discussed this a lot internally... I'd really wonder how long people think permissions should persist

<benmorss_> after tab closure

asully: with regards to permanent access like Photos app having access to photos directory, there is a field in the API called 'start in', which allows the dev to set the default directory. Not currently planning on exposing a way to ....

<tomayac> martinthomson: You also cannot get direct access to ~/Downloads, just subfolders to prevent it to become a location for supercookies…

asully: We are currently blocking 'documents' directory to prevent abuse (not give access to important folder like that)
… The start-in field can also specify a previous directory that was used / returned by the API

ErikAnderson: Interested maybe in "If installed, then what?" maybe later

<ErikAnderson> Downloads is also scary from a DLL planting attack perspective. Lots of installers are susceptible.

martinthomson: Concurrent access to file, how do you manage that?

asully: For the access handles API (OPFS), that is exclusive, so no worries there. Outside that, creating more copies of writable file stream creates multiple swap files.

martinthomson: So when the swap is closed, it overwrites the file. Last one to close wins? Yes.
… If the native application has a handle on the file, and is actively using it, and the file swaps in to 'gain the mark of the web', are we confident that the native application is going to recognize the mark of the web now?

asully: Something we haven't explored is file locking. There are some concepts, but in flux. One thing that we have explored - you can't say "this file is locked, you cannot access". Mostly advisory locks. If a native app ignores, can't bother. We have thought about adding advisory locks

martinthomson: Does the browser respect the advisory locks? Yes. Is that required in the specification?

asully: No, that is in flux, we are exploring file locking still

jsbell: To clarify, exploring file locking for the OPFS, but locking outside the OPFS, nothing is in the spec about that, and unclear that we would.

asully: That is something that we could specify, taking/respecting advisory locks

ErikAnderson: If the file is read-only on windows, what would happen?

asully: The site would immediately reject a readwrite request to that file.

<martinthomson> ErikAnderson: are the read/write flags a property of the directory?

ErikAnderson: If a file on disk doesn't have a mark of the web, and then open using this API writes to it, are we expecting browsers to add 'mark of the web'?

asully: Yes

martinthomson: How do you manage symlinks that go elsewhere? (We don't follow them). And if it's a hardlink?

reillyg: Because we use a swap file, it deletes the old file and puts the new version there.
… if on windows, if you have a file open, you can't overwrite a file that is open. So it would fail in different ways. As long as the other application still had it open, there wouldn't be a conflict, because it would still be old file / new file

jbroman: Permission question.. (answer is that we copy permissions to new file from old file)

reillyg: This is done when the swap file was created, not when swap file is swapped

asully: This is not specified at the moment

<jbroman> also owner, ACLs, extended attributes, etc

jsbell: This might not be spec but just normative behavior description on a per-os basis.

jsbell: If we are wrong and some MUST be specified, then we will

martinthomson: There are things that are part of the files that we can reason about, but there is state that exists outside of the files (applications, in users heads, etc) that needs more consideration as well.
… We have struggled with reasoning about how people think about files as they change. mark-of-the-web helps, but when you are talking about arbitrary files in filesystem with established expectation of their role in people's lives, it makes it difficult to know if this is safe to do (permission prompt aside). Don't want to put too many security consequences behind a 'yes'.

jsbell: Definitely dichotomy between old world where all native apps could do anything, and now native applications are winding back a lot of that access

martinthomson: We have a reasonably good understanding about consequences of downloading a file into a downloads folder. We have OSs with understanding of what that means, has warnings, etc. For files that have existing presence in applications & user expectations that pre-exist interactions on the web, it changes from something that I'm using from applications X, to something that could have been touched by the web in some way

tomayac: So the concern is that there is a file maybe in a subtree that was granted access to a web app, and that web app could have changed that sub-tree file and that change was a surprise to the user?

martinthomson: Yes that is an example. I don't know how to manage that, maybe we need to learn about what people expect, mark of the web, other paradigms for the OS, etc. Those sorts of questions are what I'm interested in.

martinthomson: Both write and read have implications

theowarren_: Started career when crypto locker attacks common. Is that not more accessible with this API? You grant access access to a folder, and then bam, now it's gone & encrypted. Is that reasonable?

asully: Yes. We have safeguards around things like trying to write an .exe. Trying to help users from doing a bad thing

<martinthomson> can I write a bash script on windows, which might then enter the path in a WSL image?

<martinthomson> how good is the protection against writing executables?

theowarren_: Not a ton of guards on directory case then? At which point in the flow of granting write access to a directory do we have a way to tell the user 'maybe don't do this'?

martinthomson: Warnings don't fix this problem. If consequences have that magnitude, then we shouldn't do this.

benmorss_: What about just downloading executables and executing? corporate enterprises protect this.

benmorss_: There requires similar level of concern of writing an .exe to directory access, then.

(which can be done from downloads?)

theowarren_: Do we have safebrowsing protections here?

jsbell: We also have done things in the past for directory upload, where when you are uploading / reading more than 10 files, then we show additional prompting. Room for more exploration here.

jesup__: There is also the fact that users have years of experience about downloading from web is dangerous. They don't have that about these prompts.

asully: The prompts are a user-agent implementations

tomayac: Concern here is about the wrapper being write-able. There used to be 3 prompts to access files in a folder. But it makes sense that if you grant access to folder that it might be bad. At the same time we have input-type file-webkit-directory, can read entire filesystem. There is a status quo of things being bad. If the spec says that user agents can make some directories writable, but user agents can have directories read-only, then
… we would be in the same place as now. But, VSCode usage would be very annoying - you have to grant access to every file in a folder etc.

theowarren: Am more concerned about the 'write' scenario.

dmurph: Are warnings like "hey this is trying to delete folder" or "trying to write to a lot of files" sufficient for making this better?

theowarren: I think we should start more with that, yes.

benmorss_: Well we should try to figure those out now here!

reillyg: We don't really know what the user's mental model is here, we can do a lot to imagine what the user's model might be, but we need to accept that we are going to need to be reactive to what actually happens in the world when this is available.

theowarren: Are we tracking that?

estade: So the different between this and downloading an executable is that users have already done that?

theowarren: There is browser UX and a lot of existing user education and mitigation at the OS level for stuff that ends up in the downloads folder. Like windows defender for things put in the downloads folder.

benmorss_: Help us make it better

martinthomson: The path for us implementing this is possible, but the argument that this isn't much worse than the current state isn't persuasive.

reillyg: Open, save back to file, is something that users expect. The model now of opening file, and then explicitly choose to overwrite it by choosing it all over again (or have it get put into downloads folder), is asking a lot. Attempting to help users keep their data in the location they originally selected seems rather valuable. This can fit into a more simplified version of interaction. VS a directory where when you are opening a
… directory, you aren't saving over the whole directory. So it is more difficult to create a clear flow of saving a set of files back to a directory.

asully: Closing thoughts, I would argue that yes write is a lot more scary than read, but I tried to express in first few slide, files are complicated right now. If a user agent only exposes the read-only version of the API, that ONLY makes developer's lives easier.

theowarren: I agree

<jesup__> +1 to the speaker from webkit

dcrousso: I don't think anyone is disagreeing that the current file flow is bad. I don't like that websites are now the one writing to the file, and I don't trust websites doing that. I think improving the experience seems orthogonal. Opening up this side-channel scares us.

Slideset: https://docs.google.com/presentation/d/1LwLYZjGq0tVwvttJzmBqjJ1ijJb2JZ5X-q-AZcN6kbU/edit?usp=sharing

estade: welcome
… storage buckets is a meta-API in the storage space, not a new way of storing data
… a better way to organize data stored by other APIs
… a few goals; the primary one is to allow authors to store their data in a way that minimizes the chance of important data being evicted due to low storage space
… currently, when there is storage pressure, the UA evicts data
… e.g., evicting all the data for c.com
… this means that a.com, b.com have no reason to worry about their usage, except for their own quota
… navigator.storage.persist blocks all of your storage from being evicted
… but it's a large hammer -- not all of your data is that important
… buckets allow authors to express relative importance of their data
… there's also the goal that data management be less manual than it is now
… want data that a site might want to be store can be managed en masse in a more simple/ergonomic manner
… storage APIs don't have to reinvent the wheel
… without buckets, there is one big bucket
… with buckets, within each bucket is data associated with one or more storage APIs
… it should be possible for the UA to be smarter about eviction
… rather than clobbering all of one origin's data, particular buckets are evicted
… e.g., email client
… two broad kinds of data: messages in the user's inbox, synced from server; drafts that a user is working on which are not yet stored on the server
… right now, they compete for quota
… when a site runs out, it cannot store any more drafts; if the site is evicted, all the drafts are thrown out
… with storage buckets, a site could specify that a local cache of remote data is put into a bucket that's okay to evict
… the local data that's not cached anywhere goes into a "drafts" bucket, where the site specifies that it is more important, please do not evict
… eviction and durability:
… generally going to be a correlation between whether something can be evicted and whether you want it to be durable
… another motivating example: storing all of the data for a single user in one bucket, data for another user in another bucket (multi sign in case)
… when a user signs in, the site can easily toss out all of the data associated with that user
… rather than having to do so for each API
… example: and important bucket
… durability -- whether when there's a write, it's a synchronous write to minimize chance of data loss due to power loss by flushing to disk, at a performance cost
… persisted -- whether this data should be subject to eviction
… of course, the UA is allowed to reject these policies -- they're best-effort
… but it should let the site know whether the policy succeeded
… for example, a UA might restrict the ability to persist data to certain sites, or require a user permission
… when persisted, can only be deleted by site or user explicitly, not to free up space

hober: that doesn't seem great
… if there's sufficient storage pressure, something's got to go
… if everybody says persisted: true, there's a race to the bottom

estade: 1) every site is still subject to a quota
… 2) user agent is encouraged to place some sort of restriction on the kind of sites that can persist data, not arbitrary sites on the drive-by web

hober: even on those sites that get that most of the time, it's only most of the time
… if circumstances are extreme enough, the browser _will_ evict you

estade: it's best-effort in that if you request it, you may or may not get it
… if you do get it, you won't be evicted

hober: can't tell at creation time that extreme circumstances will exist a month from now

estade: these policies are designed to future-proof storage APIs
… they actually both come from existing policies
… you can already use navigator.storage.persist to get the same behavior
… applies to all storage APIs, just couldn't previously do it per-bucket

hober: did it have the same [persistence] guarantee?

jsbell: currently implemented in Firefox and Chromium-based browsers
… Firefox shows a prompt; Chrome relies on site engagement heuristics
… because sites have stored a lot of data, or general OS space crunch, browser has to make a choice
… don't remember Chrome's current behavior, but Android if the device is low on space, will show UI to help users clean up

estade: on a mobile device, you might be warned you have no space, can go into app settings to clear app space at the OS level
… similarly, the UA can have a dialog that shows how much space each site is using and delete it
… but it won't be deleted without user interaction

hober: I expect UAs to want the ability to treat persisted: true as "I'll try much harder than for other storage, but if I can't, game's over"

jsbell: in that case, Chrome will interrupt the user and ask them

estade: we don't intend to closely spec the eviction algorithm
… it's possible another UA could behave differently
… e.g., some UA might decide durability should be some heuristic

hober: I agree; you started with "must"

estade: more assertive than I intended

jsbell: original iteration for navigator.storage.persist -- if persist is granted, sites should be able to assert to the user that data they store locally won't go away

estade: it will lead to a poor UX if it's something they and the app expect to not magically go away
… if a native app uses too much space, the OS doesn't go around deleting data automatically
… that's the idea here
… durability is a string that has too values; persisted is a boolean
… these are chosen to match pre-existing storage APIs
… another thing you can do with storage buckets is manage your own quota
… sites are limited in how much they can store, and are motivated to stay under that limit or their app will stop working
… they may have bucket which is less important than another bucket, they don't want it to take up all the space, so they can give it a quota
… expiration also improves ergonomics; data will automatically be deleted at that time, subject to certain guarantees
… can delete an entire bucket
… APIs are promise-based
… has integration with Clear-Site-Data, can use header to delete a bucket
… counterintuitively, service workers can be in buckets even though they're not affected by quota
… if serviceworker relies on data in bucket, no sense spinning it up only to find that data is gone
… lets you tie semantically-related parts of your app together
… right now, we're implementing in Chrome
… there are parts of the API you can use, and parts that are silently unimplemented
… by EOQ4, will be ready for broader adoption and testing
… in some sort of trial that developers can check out
… we're here to get feedback
… Q&A; we have some questions of our own as starters, but welcome questions from others too

hober: can you call .delete to delete the default bucket?

estade: default bucket has a reserved name
… there are four policies
… some can be changed after creation, some not
… open question whether you can open or operate on the default bucket, e.g. deleting it
… haven't decided one way or another, but probably don't want to allow changing default bucket's policies

hober: I can imagine a site wanting to intentionally restrict itself with quota, for instance
… e.g. setting a policy over a large website that we'll only use 10 MB

estade: they could do this with buckets, never using default bucket

hober: harder in an existing codebase

estade: one of the primary motivators would be to make it easier to migrate to using buckets
… without substantially changing your existing code
… don't currently have a way to migrate data from one bucket to another
… probably "yes, with restrictions"
… expiration is pretty much guaranteed
… we have to look at our own implementation and verify that there's no way that this will break assumptions
… if you can open the default bucket --
… you can get at that data using the existing APIs without buckets --
… there's a lot of things that touch the default bucket currently; need to think about all the ways these interact
… there are probably some legitimate use cases

dhuigens: this seems useful
… as far as I understand, current policies are per-site
… if you have different origins, you have to coordinate storage usage between them
… this seems useful for that, to be able to say mail can this amount or %
… I think it might be useful to have more granular persistence indictions
… for example, if you have some data that's expensive to compute, but you could recompute it
… you might want to say "this is less important than that data, but more important than that data"
… maybe an integer priority instead of a boolean?
… possibly out of scope, but even for items within an IndexedDB
… e.g., a cache ordered by date -- recent items are more important than older items -- but maybe too complicated

estade: good point re. integers
… might add complexity to the API and it's unclear if these integers are well-calibrated across sites
… e.g., do you delete priority 9 across sites at the same time?
… don't want sites to compete

dhuigens: purely within one site
… start with lowest priority data within that site first
… I do also see the value of a boolean guarantee
… so you can promise to the user

jsbell: re. IndexedDB
… buckets is one of the ways we want to solve that, without having part of the IndexedDB going away at any time

christianliebel: StorageManager/persist also works in WebKit
… really like this, it makes developer's lives much easier
… e.g., deleting per-user storage on logout
… encryption -- is there any update? is it still planned?
… developers are concerned that if someone has access to their site, don't want all their storage to leak:

ayui: not in our MVP, want to think deeply about it
… especially with more use cases

ErikAnderson: +1 on the statement about persisted attribute might be a bit coarse
… don't know if integer priority is the right thing, or some optional explicit signal, like "this bucket is needed for offline experience"
… vary behavior for an installed PWA
… don't know the right set, but suspect app developers could indicate

estade: user agent could take into account "is this an installed PWA", "how recently has this been used"
… certainly some site from 2 months ago shouldn't be allowed to take up space under storage pressure over PWA you use every day

ErikAnderson: feels like there might be a few buckets and an app might have a preference about what goes away first

jsbell: browser quota management and eviction is radically different across browsers today
… hopefully that means we're innovating, not locked in
… but that's why we're treading carefully here
… Pete from our DevRel team wrote an article

ErikAnderson: I know on Windows, Disk Cleanup Wizard has a pluggable model for apps that may want to opt into helping the user free up disk space
… should a ServiceWorker get a callback and a limited amount of time to free up space?

dmurph: in Japan in 2019, Andrew Sutherland from Mozilla and us whiteboarded to come up with a v0 prototype
… wondering if Mozilla is still involved or not

estade: that is a question we have for the group; we haven't been in contact lately
… we would like to reopen those avenues of feedback and see how much interest there is from other vendors, including Mozilla
… our other questions for the group:
… other important use cases that might shape the API?
… what seems missing? there's some feedback about encryption, granularity
… you can also get a lock out of a bucket, which is perhaps an ugly duckling compared to quota-managed APIs, but along the lines of ergonomics/organization
… bucket is being used as a namespace for the lock

<cwilso> Pete's article, btw: https://web.dev/storage-for-the-web/

estade: if people find that useful, great; it's perhaps unintuitive because you might assume it has something more to do with the bucket
… one thing we haven't thought through the implications of is nested buckets
… we've talked about one app with multiple users, multiple apps, multiple kinds of data -- but if these add together, you might want nested buckets
… opens up questions like "are these policies nested?" "what if an outer one is not persistent but an inner one is?"
… doesn't seem insurmountable; if we get signal that this is useful, it would be helpful to us prioritizing
… longer format of this can be found by reading our explainer
… including alternatives considered
… feedback so far is great; please keep it coming

<estade> https://github.com/WICG/storage-buckets/issues

<cwilso> rrsagents, make log public

<reillyg> Presenting: https://goo.gle/tpac2022-isolated-web-apps

reillyg: observation - developers seem to want to build applications using web technologies even when they aren't published on the web. e.g. electron, react native, nodejs, dino
… also API experience & knowledge sharing. New APIs are being created, but also adopt web APIs because developers like consistency
… We also see some applications need protection from compromise of their infrastructure. E.g. Signal. They used a Chrome App, which is like an extension that acts like an app. They liked that, and they don't like that we are deprecating that.
… why? Because on the web you can make a code change at any time, and you don't have protections against a compromised infrastructure. They liked the explicit update step.
… 3rd, there are some capabilities that are too powerful for the web platform trust model. Open lack on consensus on this - device APIs, WebUSB, etc. Although some things fully just don't make sense & is less trustworthy than other kinds of applications.

reillyg: Proposal: Standardize an alternative model for distributing web content which provides stronger trust signals and robust integrity protections.
… First way to do that is packaging. You need to be able to package your content for integrity. It seems like installation is a prerequisite for integrity always. This has to happen before the application is launched.
… We are doing this in a web bundle, but we are doing this in a different way than other proposals. These are signed. (different than signed exchanged). The bundle is signed, the things contained are not. So you can look at an entire bundle and say that this thing is a unified version of an application, verified to a developer, and can have signings from other sources of trust.
… Because they are in packages, we don't give them http/s scheme. We are not trying to recreate a way of loading content over https, we recognize that because these things are coming from a package the user installed from somewhere else, it's not an origin. in the isolated-app:// scheme, each bundle is it's own origin in that scheme.
… Update is an explicit step that replaces the bundle on the same origin.
… Integrity: There is a default content security policy and trusted types are required. All permissions are off by default. If you are going to use a policy controlled feature, you must declare this up front.
… Isolation: All existing site isolation work still applies. Storage is isolated from other web content. They cannot be iframed, and must be launched from the 'standalone' desktop UI. The identity of the application that the user will understand is more about the name and icon of the application.
… navigations into the applications work more like launching a native app. You can only launch the app in ways the application declared that it could be launched (declares entry points). Out of scope navigations just jump into the OS default browser. App loses control of that window.
… Open for discussion!

dhuigens: Thanks for thinking about this, applies to Proton as we are using client-side encryption in all web apps. Maybe one thing still missing is some mechanism to ensure that a given version of a given web app is the same for everyone that accesses it. It would be nice to have certificate transparency or binary transparency so security researcher can look and verify. is that in scope?

reillyg: The current proposal doesn't mandate a particular way of .... The one that is current prototyping in Chrome, simple, you install the app, and it will only update locally. For testing. One option, probably initial for how this work work, is something more akin to code signing on Windows or MacOS, where the developer registers with the vendor and can notarize the application. So signed by third party service, and create a record for
… which versions of the application exist. Can also create a rule like versions must be mono-atomically incrementing.
… A less simple version implements certificate transparency ideas, some distributed stuff [scribe didn't hear], but those asks are a goal we have.

dcrousso: Is there at all plans for any way to allow the bundle to mark itself as completely isolated from the network?

reillyg: That's a great question, that is the first question we got when we published the explainer. Not currently possible, but seems feasible/reasonable.
… Currently you can make network requests, but you cannot load code / executable content from outside.
… But you could feasibly say you can only load things from the bundle. That would be reasonable. It makes me a little nervous because there are just so many places where the web can make network requests. But given that the origin of these things is cross-origin from the entire web, that's cross-origin, and it's restriction-engine would already be working.
… classic example is 'pdf converter' or things like that.
… so that is a really plausible opportunity for building network blind applications. You could event imagine that you could design an API to have more control over the network access?
… The hardest thing to fix right now is navigation-based data leaks. If there is no CSP for navigation.... but I think I was talking to someone on the Chrome security team that this was a thing that people were thinking of as an option, so if that is made then that could be used here.

dhuigens: One minor comment, opposite with CSP. If we just allow https for sandboxed iframes but not for blob urls, then that kind of incentivizes putting stuff on https and making external requests in some sense.
… for example, email client showing email in sandboxed iframe. Being able to show a blob url there would be useful.

reillyg: I agree, we do want... there is a tradeoff we want to make around script-generated content. We want developer to be able to put all script in the bundle, but there are always ways for developers to execute script that is fetched from a CSP. This could also be something that the trust authority could have a policy about this - they don't sign / distribute apps that have network access or something...
… manifest fields that let you extend the CSP in explicitly that aren't really dangerous but need to be specified. One example is nonces
… CSP the headers AND together instead of OR together. Forget explicit example - I think CSP nonce, like an analytics script, then you would have to add that to the existing header, you can't add an additional header with that additional thing in it. So it may make sense to have a place to add a specific set of attributes to CSP, but you have to get into the details.

dcrousso: What is the imagined used case for 3rd party iframes? Why?

– DRAFT –
File System Access API

16 September 2022

Attendees

Meeting minutes

WICG File System, File System Access

Diagnostics