12:34:23 RRSAgent has joined #social 12:34:27 logging to https://www.w3.org/2023/09/13-social-irc 12:34:27 RRSAgent, make logs Public 12:34:28 please title this meeting ("meeting: ..."), pchampin 12:34:58 meeting: SocialWeb data portability - TPAC 2023 12:35:10 scribe: eprodrom 12:35:38 chair: dmitriz 12:36:25 dmitriz: this topic is very popular. A criticial feature of interoperability is data portability between ActivityPub instances 12:36:40 dmitriz: microblogging, forums 12:37:56 dmitriz: migrate accounts (actor profiles), migrate content 12:37:56 dmitriz: including social graph (followers, following), events, content 12:37:56 q+ 12:37:56 dmitriz: talk about in-progress specifications 12:38:28 present+ 12:38:28 present+ 12:38:28 present+ 12:38:28 present+ 12:38:28 present+ 12:38:36 scribe+ 12:38:37 tantek has changed the topic to: Social Web Incubation Community Group (AKA SocialCG / SWICG) https://www.w3.org/wiki/SocialCG. Chat log: https://chat.indieweb.org/social/ currently W3C TPAC session: https://www.w3.org/events/meetings/46ce9082-710a-4fe2-8cf8-5dcdc207c877/ 12:38:40 eprodrom: One agenda item I'd like to add is - I did a portability report https://w3c.github.io/activitypub/data-portability-report.html 12:38:48 eprodrom: propose reviewing https://w3c.github.io/activitypub/data-portability-report.html 12:38:51 n8s has joined #social 12:39:30 eprodrom: the goal of this report was not to propose any new data portability systems, but to cover the state of data portability in the AP world, today in 2023 12:39:43 ... in particular, this would be guidance for developers and end users, to understand what their options are 12:39:44 bumblefudge__ has joined #social 12:39:45 ^ would suggest putting that in the Abstract of the report 12:39:47 https://w3c.github.io/activitypub/ 12:40:00 ... first section, I gave a probably exhaustive catalogue of what 'Your Data' might mean 12:40:16 ... in two major sections. When we talk about a federated system, there is the data that exists on your own account server, 12:40:36 ... but then of course there's the data on other servers 12:40:51 ... so, on your own servers, there's your actor id / identity, various URIs, actor profile properties (name, avatar, links, etc) 12:40:59 ... and the profile URI (which can be distinct from actor id) 12:41:23 ... you have the Outbox (a collection of all activities the actor has done), and also, as part of that, are reactions (each activity or created content can have a creation of Replies, Likes, and Shares) 12:41:48 ... the Inbox is everything that the actor has received through subscriptions or direct messaging. There are uploaded files. Followers and Following collections 12:42:03 ... there's the Blocks collection (of blocked users), and also a Public Key that's used for HTTP Signature requests. 12:42:23 ... what is distributed on other servers: the user's Actor ID is in others' Followers/Following lists. 12:42:40 ... there are links in various @mentions etc. Object IDs that track back to their canonical location 12:42:46 ... as well as the URIs for uploaded files 12:43:04 ... so the upshot here is -- we have a LOT of distributed data, that either lives on the account's server, or distributed among other servers 12:43:15 ... in terms of data portability, we have two main mechanisms today for doing Data Portability 12:43:35 ... one is Domain-Based Data Portability (when you own your own domain), which lets you transfer implementations, hosting services, etc. 12:44:18 ... and the second one is Mastodon's (& others) Move action 12:44:18 ... which is a technique of moving accounts and activities to other implementations or instances 12:44:18 ... the analogy I always use is - similar to WordPress 12:44:39 ... you can Export your content, comments, accounts, etc, and then Import it on another implementation or another domain, etc. 12:44:50 ... you can do something similar on a Mastodon server on the Fediverse. 12:44:53 present+ 12:45:16 ... but the overall pattern is important. Fairly basic, but critical 12:45:38 ... there are a number of limitations with Domain-Based portability. 1) you have to own the domain, and 2) You have to run your hosting server on the fediverse 12:45:57 ... there are a number of hosting services that offer this, but you do need to use one that lets you map a domain onto the hosting system 12:46:02 ... this is a fairly high barrier to most users 12:46:14 ... the other challenge is -- we don't have a standard format for Exports / backups 12:46:36 ... we have a start on this on Mastodon - it outputs a single user's data. But there is not Import on the other side, ironically 12:46:38 ... we have an open Github issue on this 12:46:54 q+ to note challenges with AP/AS @-mention interop/portability https://github.com/snarfed/bridgy-fed/issues/493 and consider BAF for backup format 12:47:03 ... lastly, different implementations may use different URI patterns for Activities and content. For example, Mastodon's profile is at /users/username, other impls may use other patterns 12:47:13 [preview] [gRegorLove] #493 Add support for publishing mentions 12:47:15 ... so backing up and restoring, might not map correctly 12:47:24 Not AP-specific, but micro.blog has experimented with a "blog archive format" for blog content that has a HTML format with microformats. Not directly applicable, but may be interesting: https://indieweb.org/blog_archive_format. 12:47:24 q? 12:47:29 ack eprodrom 12:47:46 tantek: no questions about technique per se, 12:48:03 tantek: I appreciate the framing of the scope of the document. I think that literally what you said belongs in the abstract of the doc 12:48:17 ... so like, just copy/paste what was scribed, into abstract! :) 12:48:36 ... I appreciate you mentioned the React mentions in particular, interop wise 12:48:52 ... to be specific, the way Mastodon and Friendica do @-mentions, Bridgyfed has tried the examples in the spec & some variants, and has been unable to get @-mentions to work 12:49:22 ... for example on Bridgy, when I @-mention you, it's not able to generate a correct activity or object to notify you, specifically 12:49:30 ... I'm not sure if that's in scope, but it's one of the issues 12:49:42 Good point, Tantek. 12:50:09 pfefferl_ has joined #social 12:50:11 eprodrom: yeah, I think the general idea with portability, is that the @-mention stays, links to the old domain, but that hopefully redirects you, etc. 12:50:18 Yeah we should decouple domain portability vs. data / instance portability. 12:50:23 tantek: agreed, yeah, and there's still various challenges there 12:50:38 tantek: lastly, you mentioned there's no official backup format. There's a challenge there with blogs in general 12:50:50 ... there's a default meme in general "Just use RSS", but that doesn't always work 12:51:01 https://indieweb.org/blog_archive_format 12:51:05 ... so there's one specific format that (?) Reese worked on - blog archive format 12:51:17 ... would that suit the needs for account backup & restore? 12:51:18 q+ 12:51:21 ack tantek 12:51:21 tantek, you wanted to note challenges with AP/AS @-mention interop/portability https://github.com/snarfed/bridgy-fed/issues/493 and consider BAF for backup format 12:51:23 I like the plain text, structured, ZIP-based format of the blog archive format. 12:51:28 [preview] [gRegorLove] #493 Add support for publishing mentions 12:51:34 tantek: it also includes replies, responses, etc 12:51:48 ack tantek 12:51:49 ... its' very close to modern AS2 profile 12:52:16 https://wordpress.com/support/export/ 12:52:18 ref ^ 12:52:36 dmitriz: also worth looking at WordPress export format 12:52:36 IIRC WP format was a lot of XML? 12:52:48 tantek: I think it informed the blog archive format, too? I think it does more / is superset 12:53:00 More semantics are included in blog archive format. 12:53:05 q+ 12:53:19 q- 12:53:48 eprodrom: I want to mention the fact that - I believe that Mastodon uses is a collection of AS2 data 12:53:58 ... and since that's the native format for AP, it makes a good candidate 12:54:04 ... so it's a good starting point 12:54:04 Worth discussing re: AS-encoded data for an archive format. 12:54:20 ... I'd like to talk about this second data portability technique that is used on the Fediverse right now, and that's the Move action 12:54:28 q? 12:54:43 ... it's a mechanism that's used for - primarily used to move from one Mastodon instance to another. However, there are other implementations that support the technique 12:54:56 ... it is limited in scope, but has some good outcomes that cover data portability 12:55:17 ... the mechanism is relatively simple: a user has an existing Actor at username@oldexample, then creates a new actor at username@newexample 12:55:30 ... then add an 'alsoKnownAs' property, points at the old actor 12:55:41 FYI micro.blog has some support for Move: https://www.manton.org/2022/12/02/moving-from-mastodon.html 12:55:42 ... which denotes "I'm wiling to accept Move requests from old actor" 12:55:55 ... similarly, adds a 'movedTo' property to the old actor profile 12:56:00 [preview] [Manton Reece] Moving from Mastodon to a new instance or to Micro.blog 12:56:08 ... then finally, initiates a Move activity, from the old account to the new account 12:56:10 FYI2 and Bridgy Fed is working on Move support: https://github.com/snarfed/bridgy-fed/issues/330 12:56:28 [preview] [snarfed] #330 Add account migration (Move) support 12:56:39 ... this goes out to all followers, who check & validate, then unfollow the old account, and follow the new account 12:56:47 ... once this happens, the old account's profile URI _will_ automatically redirect to the new account's URI 12:57:09 ... there is not an automated mechanism for moving the Following list. But, mastodon allows downloading the old following list, and importing it into the new account 12:57:17 ... it's a manual step, to download a CSV file & re-import it 12:57:47 ... the results are - we have the redirect between profile URIs, the follwers list is close to what was at the old account. the following list does the export/import thing. 12:57:59 ... the old account's Following list is empty, and the network has everything updated 12:58:22 ... this technique is primarily moving the social graph. It does not cover the content, uploaded files -- those remain at the old URLs at their old IDs 12:58:37 ... if the old account's server is down, it is no longer possible to move to another account 12:58:42 ... and it does not work if the old server's account is blocked by followers 12:59:02 Interesting. 12:59:13 ... so a common occurrence -- a server gets massively de-federated, users then try to move from that server, but are unable to, because the old server is not accepted (the Move actions bounce) 12:59:17 I haven't read many docs about that. Worth highlighting more. 12:59:30 ... so, if it gets de-federated due to a small percentage of bad actors, but it traps everyone else 12:59:33 timbl has joined #social 12:59:35 That feels *really* important to address. 12:59:40 ^^ 12:59:47 ... I'll pause here. that covers the primary mechanisms for data portability on the Fediverse right now 12:59:48 capjamesg, feel free to q+ if you want to add to the conversation on the record 13:00:00 q+ 13:00:21 ... I know that Firefish and Friendica have somewhat enhanced mechanisms for this Move practice that may move over some content, or mirror some content. I haven't tested it out, but would love to track these 13:00:22 ack eprodrom 13:00:23 q? 13:00:33 chair: dmitriz 13:01:03 capjamesg: point made with regard to people being isolated warrants a great discussion. 13:01:25 capjamesg: being left in that predicament is difficult for the average user 13:01:27 q+ 13:01:32 ack capj 13:01:52 q- 13:01:57 tantek: should we track this problem with an issue? 13:01:57 Thank you, kindly! 13:02:03 eprodrom: yes 13:02:11 bumblefudge__: there are feps to address 13:02:21 ack dmitriz 13:02:32 https://codeberg.org/fediverse/fep 13:02:41 dmitriz: feps = Federation Extension Proposal 13:02:52 ^ Extensions to the Spec (defining behaviors and/or data models not defined in the core spec) 13:02:58 dmitriz: there are 2 interesting feps - identity proofs, signed objects impact migration 13:03:46 dmitriz: main challenge if old server is dead or uncooperative or defederated enough that on-line dataportability will not work 13:04:04 dmitriz: to address these challenges, alternate technique to do this cryptographically 13:04:19 https://codeberg.org/fediverse/fep/src/branch/main/fep/c390/fep-c390.md 13:04:19 dmitriz: signatures prove equivalence 13:04:49 (insofar as signature can be checked by discovering key material for that actorID...) 13:05:09 dmitriz: use a key or DID, sign old account, set up movedTo and alsoKnownAs, sign new account 13:05:20 (and insofar as new server lets you BYO key :D ) 13:05:20 dmitriz: can claim a continuation of identity 13:05:38 dmitriz: can perform most of the activity based on equivalents 13:05:46 q+ 13:05:48 q+ to ask how would you sign a dead from server and ask does it matter which direction the defederation occurs? at the from defed the destination, or at the destination defed the from? 13:06:21 dmitriz: only possible if user has key control, or old server lets you export your keys 13:06:34 dmitriz: warning and red flags on exporting private keys 13:06:56 q? 13:06:59 scribe+ 13:07:01 ack eprodrom 13:07:21 eprodrom: I do want to note that although the technique - the FEP covers doing the signature, it does not cover the Move protocol 13:08:16 ... the implication is - you can USE it for Move activity, but doesn't describe the full procedure 13:08:37 ... so it needs more work / additional specifications of that 13:08:37 ack tantek 13:08:38 tantek, you wanted to ask how would you sign a dead from server and ask does it matter which direction the defederation occurs? at the from defed the destination, or at the 13:08:38 ... destination defed the from? 13:08:39 Thank you dmitriz for highlighting the FEP. 13:09:00 tantek: fascinating proposal. Uncovers a lot of really good use cases. Should be documented independent of a solution 13:09:09 dmitri: 3 diff failure cases: old server down, old server uncooperative, or older server defederated enough to not be cooperated with 13:09:18 Have we discussed what happens if a domain relapses and is taken over? 13:09:25 ohhh good point (re read-only servers) 13:09:26 (maybe this has never happened, but just thinking about a fail case) 13:09:38 tantek: dead server, uncooperative server, read-only server, defederation 13:10:01 @capjames - also good variation (I think that's a variant of "old server is down" or uncooperative), but it's a different flavor - a possibly actively hostile server 13:10:02 tantek: do you mean one or both servers are defederated (to v from, from v to, both) 13:10:22 dmitriz Yeah. Is there a case where a server becomes "untrusted" due to malicious activity? 13:10:35 tantek: read-only: archive server like Wayback Machine backup 13:10:37 @capjamesg - definitely! 13:10:50 ack tantek 13:11:08 eprodrom: that IS a technique people use for blogs, websites, etc. of scraping the Wayback Machine, and using that to recover data 13:11:24 tantek: useful to document these approaches 13:11:26 eprodrom+ 13:11:27 +1 to documenting the use cases! 13:11:35 +1 Tantek. 13:12:05 ++ DID is key to bsky. 13:12:12 tantek: Blue Sky considered data portability from scratch uses a DID 13:12:29 tantek: supposedly portable across servers, without actual proof since no other servers 13:12:55 good call (doing an A/B analysis on Blue Sky's identity approach and this FEP) 13:12:57 q? 13:13:00 tantek: is there a possibility of a bridge? 13:13:01 q+ 13:13:04 q+ 13:13:31 bumblefudge__: the way Blue Sky uses DIDs is something we have been following, also Nostr. we == DID people 13:13:36 ref: https://atproto.com/specs/did 13:14:05 > The AT Protocol uses Decentralized Identifiers (DIDs) as persistent, long-term account identifiers 13:14:18 bumblefudge__: BlueSky and Nostr are private key == identity, retrofit compatibility onto URI-identity 13:15:09 q? 13:15:13 ack bumbl 13:15:13 bumblefudge__: identity proofs FEP tries to be DID neutral, should not matter what DID method you use, can't think of DID methods won't work, including Blue Sky DID Web, DID PKH (Ethereum wallet style), if delegate to wallet 13:15:33 q+ to suggest the requirement that users be able to easily, privately, non-destructively, test the system against various failures 13:15:49 q- 13:16:16 eprodrom: we haven't taken note of it during this conversation, but I do want to talk about it quickly, which is - the topology of the Fediverse today 13:16:20 ... in terms of the locus of control 13:16:40 ... I think there is a structure/topology that is common on the Indieweb, in which a single implementation supports a single user, on a single domain 13:16:55 Good assessment re: IndieWeb. 13:17:02 ... all under control of that single user. On the Fediverse today (so, Mastodon, Pleroma, Firefish, etc), we have a different common topology 13:17:56 overlap with examples like WordPress where an "instance" can have one or a few users 13:17:56 ... which is - hundreds/thousands/more of users, who use a single domain, they have a weak affinity of that domain (picked it from an arbitrary list), 13:17:59 ... with a volunteer admin, not a paid service 13:18:03 Agreed. 13:18:37 ... this is not universal, but very typical. Because of the low affinity between user and domain service, the requirement of portability (of moving from server to server), it's very common 13:18:44 ... so in Mastodon world, it's very common to talk to people who have moved 5-6+ times 13:18:52 ... so the needs for portability is extra high 13:19:32 Agreed. 13:19:32 q? 13:19:32 Blog posts! 13:19:32 ack eprod 13:19:32 ack eprodrom 13:19:32 +1 eprodrom big distinction in locus of control 13:19:32 q? 13:19:32 ack sandro 13:19:32 sandro, you wanted to suggest the requirement that users be able to easily, privately, non-destructively, test the system against various failures 13:19:32 ack sandro 13:19:46 ohhh man, "backups you don't test are useless" --- +1 !!! 13:20:08 "how do you test a social server profile backup?" (paraphrased from sandro) 13:20:11 sandro: a user requirement. backups that you don't test are useless. Need to know that social backups are working, without breaking or undoable effects 13:20:38 mirroring++ 13:20:38 mirroring has 1 karma over the last year 13:20:39 sandro: common example is for users on social services to have backup accounts on same service 13:20:41 q+ 13:20:42 testing++ 13:20:42 testing has 1 karma in this channel over the last year (3 in all channels) 13:20:54 Mirroring feels complicated to the average user? 13:21:08 sandro: excellent for peace of mind 13:21:13 capjamesg not if you call it a "backup account" which lots of IGs have 13:21:27 (and they do it manually!) 13:21:59 dmitriz: what additions will we need to add to the core data model 13:22:04 eprodrom+ 13:22:07 q+ 13:22:15 q- 13:22:17 ack eprod 13:22:42 eprodrom: so, I think one mechanism that Sandro alluded to here - having the ability to back up the content that's created. Preferably "hot backups" 13:22:51 ... of activities, uploaded files, social graph 13:23:19 ... and I think Sandro was suggesting having an "alt" account type of hot backup. The other thing is of course, a static backup on an external storage 13:23:33 ... and having the location of that backup travel with activities & files as they go around the network 13:23:52 ... so that if the content is inacessible, you can try it at this alternate location 13:23:53 q+ 13:24:00 q- 13:24:13 ack capj 13:24:50 q+ 13:24:52 capjamesg: writing blog post about all thought processes on this issue, let's collaborate on proposed additions 13:24:59 ack bumbl 13:25:00 q+ 13:25:24 ++ 13:25:26 bumblefudge__: we should start FEPs based on use cases, not solutions 13:25:28 q+ to ask where user stories should live 13:25:29 Absolutely. 13:25:36 +1 to use case first dev 13:25:44 q- 13:25:45 We should be use-case drvien. 13:25:49 *driven 13:25:52 q- 13:26:20 yea! 13:26:33 q+ 13:26:36 q+ 13:26:37 eprodrom: with 5 mins left in the session, I'd like to make a possibly controversial proposal -- just as we created a task force around Testing, would it make sense to take on this fairly large chunk of functionality, around Data Portability, as a task force of the Social CG? 13:26:40 ack eprodrom 13:26:41 ack eprod 13:26:52 ack capj 13:27:16 capjamesg: good to have a lead spoc for data portability 13:27:35 I'd start with a prior art review too. 13:27:58 Blog archive format, WP format, others of which we may not be aware but are used (Squarespace maybe etc.)? 13:28:05 tantek: excellent focus area for attracting group attention; broaden to include identity/account, data/posts, social graph 13:28:20 @capjamesg -- agreed, yeah 13:28:20 @capjamesg has 0 karma in this channel over the last year (119 in all channels) 13:28:20 tantek: should handle both of those examples 13:28:21 q? 13:28:23 ack tantek 13:28:26 capjamesg++ restore :) 13:28:26 capjamesg has 1 karma in this channel over the last year (120 in all channels) 13:28:27 Why thank you :) 13:28:42 q+ 13:28:56 q- 13:29:09 tantek: backups should be in scope for data portability 13:29:11 whoops didn't mean to -- 13:29:19 @capjamesg ++ 13:29:40 eprodrom: next step would be to find a volunteer lead 13:29:40 A discussion for the mailing list 13:30:33 RRSAgent, make minutes 13:30:34 I have made the request to generate https://www.w3.org/2023/09/13-social-minutes.html eprodrom 13:31:45 Zakim: end meeting 13:31:46 Zakim, end meeting 13:31:46 As of this point the attendees have been tantek, dmitriz, eprodrom, pfefferle, pchampin, capjamesg, + 13:31:46 RRSAgent, please draft minutes 13:31:47 I have made the request to generate https://www.w3.org/2023/09/13-social-minutes.html Zakim 13:31:53 I am happy to have been of service, eprodrom; please remember to excuse RRSAgent. Goodbye 13:31:53 Zakim has left #social 13:40:43 Thank you, everyone! 14:09:16 tantek has joined #social 15:18:41 tantek has joined #social 15:31:35 dmitriz has joined #social