SocialWeb data portability - TPAC 2023

13 September 2023


+, capjamesg, dmitriz, eprodrom, pchampin, pfefferle, tantek
dmitriz, eprodrom

Meeting minutes

dmitriz: this topic is very popular. A criticial feature of interoperability is data portability between ActivityPub instances

dmitriz: microblogging, forums

dmitriz: migrate accounts (actor profiles), migrate content

dmitriz: including social graph (followers, following), events, content

dmitriz: talk about in-progress specifications

eprodrom: One agenda item I'd like to add is - I did a portability report https://w3c.github.io/activitypub/data-portability-report.html

eprodrom: propose reviewing https://w3c.github.io/activitypub/data-portability-report.html

eprodrom: the goal of this report was not to propose any new data portability systems, but to cover the state of data portability in the AP world, today in 2023
… in particular, this would be guidance for developers and end users, to understand what their options are

<tantek> ^ would suggest putting that in the Abstract of the report

<bumblefudge__> https://w3c.github.io/activitypub/

eprodrom: first section, I gave a probably exhaustive catalogue of what 'Your Data' might mean
… in two major sections. When we talk about a federated system, there is the data that exists on your own account server,
… but then of course there's the data on other servers
… so, on your own servers, there's your actor id / identity, various URIs, actor profile properties (name, avatar, links, etc)
… and the profile URI (which can be distinct from actor id)
… you have the Outbox (a collection of all activities the actor has done), and also, as part of that, are reactions (each activity or created content can have a creation of Replies, Likes, and Shares)
… the Inbox is everything that the actor has received through subscriptions or direct messaging. There are uploaded files. Followers and Following collections
… there's the Blocks collection (of blocked users), and also a Public Key that's used for HTTP Signature requests.
… what is distributed on other servers: the user's Actor ID is in others' Followers/Following lists.
… there are links in various @mentions etc. Object IDs that track back to their canonical location
… as well as the URIs for uploaded files
… so the upshot here is -- we have a LOT of distributed data, that either lives on the account's server, or distributed among other servers
… in terms of data portability, we have two main mechanisms today for doing Data Portability
… one is Domain-Based Data Portability (when you own your own domain), which lets you transfer implementations, hosting services, etc.
… and the second one is Mastodon's (& others) Move action
… which is a technique of moving accounts and activities to other implementations or instances
… the analogy I always use is - similar to WordPress
… you can Export your content, comments, accounts, etc, and then Import it on another implementation or another domain, etc.
… you can do something similar on a Mastodon server on the Fediverse.
… but the overall pattern is important. Fairly basic, but critical
… there are a number of limitations with Domain-Based portability. 1) you have to own the domain, and 2) You have to run your hosting server on the fediverse
… there are a number of hosting services that offer this, but you do need to use one that lets you map a domain onto the hosting system
… this is a fairly high barrier to most users
… the other challenge is -- we don't have a standard format for Exports / backups
… we have a start on this on Mastodon - it outputs a single user's data. But there is not Import on the other side, ironically
… we have an open Github issue on this
… lastly, different implementations may use different URI patterns for Activities and content. For example, Mastodon's profile is at /users/username, other impls may use other patterns

<Loqi_> [preview] [gRegorLove] #493 Add support for publishing mentions

eprodrom: so backing up and restoring, might not map correctly

<capjamesg> Not AP-specific, but micro.blog has experimented with a "blog archive format" for blog content that has a HTML format with microformats. Not directly applicable, but may be interesting: https://indieweb.org/blog_archive_format.

tantek: no questions about technique per se,

tantek: I appreciate the framing of the scope of the document. I think that literally what you said belongs in the abstract of the doc
… so like, just copy/paste what was scribed, into abstract! :)
… I appreciate you mentioned the React mentions in particular, interop wise
… to be specific, the way Mastodon and Friendica do @-mentions, Bridgyfed has tried the examples in the spec & some variants, and has been unable to get @-mentions to work
… for example on Bridgy, when I @-mention you, it's not able to generate a correct activity or object to notify you, specifically
… I'm not sure if that's in scope, but it's one of the issues

<capjamesg> Good point, Tantek.

eprodrom: yeah, I think the general idea with portability, is that the @-mention stays, links to the old domain, but that hopefully redirects you, etc.

<capjamesg> Yeah we should decouple domain portability vs. data / instance portability.

tantek: agreed, yeah, and there's still various challenges there

tantek: lastly, you mentioned there's no official backup format. There's a challenge there with blogs in general
… there's a default meme in general "Just use RSS", but that doesn't always work

<tantek> https://indieweb.org/blog_archive_format

tantek: so there's one specific format that (?) Reese worked on - blog archive format
… would that suit the needs for account backup & restore?

<Zakim> tantek, you wanted to note challenges with AP/AS @-mention interop/portability snarfed/bridgy-fed#493 and consider BAF for backup format

<capjamesg> I like the plain text, structured, ZIP-based format of the blog archive format.

<Loqi_> [preview] [gRegorLove] #493 Add support for publishing mentions

tantek: it also includes replies, responses, etc
… its' very close to modern AS2 profile

<capjamesg> https://wordpress.com/support/export/

<capjamesg> ref ^

dmitriz: also worth looking at WordPress export format

<capjamesg> IIRC WP format was a lot of XML?

tantek: I think it informed the blog archive format, too? I think it does more / is superset

<capjamesg> More semantics are included in blog archive format.

eprodrom: I want to mention the fact that - I believe that Mastodon uses is a collection of AS2 data
… and since that's the native format for AP, it makes a good candidate
… so it's a good starting point

<capjamesg> Worth discussing re: AS-encoded data for an archive format.

eprodrom: I'd like to talk about this second data portability technique that is used on the Fediverse right now, and that's the Move action
… it's a mechanism that's used for - primarily used to move from one Mastodon instance to another. However, there are other implementations that support the technique
… it is limited in scope, but has some good outcomes that cover data portability
… the mechanism is relatively simple: a user has an existing Actor at username@oldexample, then creates a new actor at username@newexample
… then add an 'alsoKnownAs' property, points at the old actor

<tantek> FYI micro.blog has some support for Move: https://www.manton.org/2022/12/02/moving-from-mastodon.html

eprodrom: which denotes "I'm wiling to accept Move requests from old actor"
… similarly, adds a 'movedTo' property to the old actor profile

<Loqi_> [preview] [Manton Reece] Moving from Mastodon to a new instance or to Micro.blog

eprodrom: then finally, initiates a Move activity, from the old account to the new account

<tantek> FYI2 and Bridgy Fed is working on Move support: snarfed/bridgy-fed#330

<Loqi_> [preview] [snarfed] #330 Add account migration (Move) support

eprodrom: this goes out to all followers, who check & validate, then unfollow the old account, and follow the new account
… once this happens, the old account's profile URI _will_ automatically redirect to the new account's URI
… there is not an automated mechanism for moving the Following list. But, mastodon allows downloading the old following list, and importing it into the new account
… it's a manual step, to download a CSV file & re-import it
… the results are - we have the redirect between profile URIs, the follwers list is close to what was at the old account. the following list does the export/import thing.
… the old account's Following list is empty, and the network has everything updated
… this technique is primarily moving the social graph. It does not cover the content, uploaded files -- those remain at the old URLs at their old IDs
… if the old account's server is down, it is no longer possible to move to another account
… and it does not work if the old server's account is blocked by followers

<capjamesg> Interesting.

eprodrom: so a common occurrence -- a server gets massively de-federated, users then try to move from that server, but are unable to, because the old server is not accepted (the Move actions bounce)

<capjamesg> I haven't read many docs about that. Worth highlighting more.

eprodrom: so, if it gets de-federated due to a small percentage of bad actors, but it traps everyone else

<capjamesg> That feels *really* important to address.

<bumblefudge__> ^^

eprodrom: I'll pause here. that covers the primary mechanisms for data portability on the Fediverse right now

<tantek> capjamesg, feel free to q+ if you want to add to the conversation on the record

eprodrom: I know that Firefish and Friendica have somewhat enhanced mechanisms for this Move practice that may move over some content, or mirror some content. I haven't tested it out, but would love to track these

capjamesg: point made with regard to people being isolated warrants a great discussion.

capjamesg: being left in that predicament is difficult for the average user

tantek: should we track this problem with an issue?

<capjamesg> Thank you, kindly!

eprodrom: yes

bumblefudge__: there are feps to address

<bumblefudge__> https://codeberg.org/fediverse/fep

dmitriz: feps = Federation Extension Proposal

<bumblefudge__> ^ Extensions to the Spec (defining behaviors and/or data models not defined in the core spec)

dmitriz: there are 2 interesting feps - identity proofs, signed objects impact migration

dmitriz: main challenge if old server is dead or uncooperative or defederated enough that on-line dataportability will not work

dmitriz: to address these challenges, alternate technique to do this cryptographically


dmitriz: signatures prove equivalence

<bumblefudge__> (insofar as signature can be checked by discovering key material for that actorID...)

dmitriz: use a key or DID, sign old account, set up movedTo and alsoKnownAs, sign new account

<bumblefudge__> (and insofar as new server lets you BYO key :D )

dmitriz: can claim a continuation of identity

dmitriz: can perform most of the activity based on equivalents

dmitriz: only possible if user has key control, or old server lets you export your keys

dmitriz: warning and red flags on exporting private keys

eprodrom: I do want to note that although the technique - the FEP covers doing the signature, it does not cover the Move protocol
… the implication is - you can USE it for Move activity, but doesn't describe the full procedure
… so it needs more work / additional specifications of that

<Zakim> tantek, you wanted to ask how would you sign a dead from server and ask does it matter which direction the defederation occurs? at the from defed the destination, or at the destination defed the from?

<capjamesg> Thank you dmitriz for highlighting the FEP.

tantek: fascinating proposal. Uncovers a lot of really good use cases. Should be documented independent of a solution

<bumblefudge__> dmitri: 3 diff failure cases: old server down, old server uncooperative, or older server defederated enough to not be cooperated with

<capjamesg> Have we discussed what happens if a domain relapses and is taken over?

ohhh good point (re read-only servers)

<capjamesg> (maybe this has never happened, but just thinking about a fail case)

tantek: dead server, uncooperative server, read-only server, defederation

@capjames - also good variation (I think that's a variant of "old server is down" or uncooperative), but it's a different flavor - a possibly actively hostile server

tantek: do you mean one or both servers are defederated (to v from, from v to, both)

<capjamesg> dmitriz Yeah. Is there a case where a server becomes "untrusted" due to malicious activity?

tantek: read-only: archive server like Wayback Machine backup

@capjamesg - definitely!

eprodrom: that IS a technique people use for blogs, websites, etc. of scraping the Wayback Machine, and using that to recover data

tantek: useful to document these approaches


+1 to documenting the use cases!

<capjamesg> +1 Tantek.

<capjamesg> ++ DID is key to bsky.

tantek: Blue Sky considered data portability from scratch uses a DID

tantek: supposedly portable across servers, without actual proof since no other servers

good call (doing an A/B analysis on Blue Sky's identity approach and this FEP)

tantek: is there a possibility of a bridge?

bumblefudge__: the way Blue Sky uses DIDs is something we have been following, also Nostr. we == DID people

<capjamesg> ref: https://atproto.com/specs/did

<capjamesg> > The AT Protocol uses Decentralized Identifiers (DIDs) as persistent, long-term account identifiers

bumblefudge__: BlueSky and Nostr are private key == identity, retrofit compatibility onto URI-identity

bumblefudge__: identity proofs FEP tries to be DID neutral, should not matter what DID method you use, can't think of DID methods won't work, including Blue Sky DID Web, DID PKH (Ethereum wallet style), if delegate to wallet

eprodrom: we haven't taken note of it during this conversation, but I do want to talk about it quickly, which is - the topology of the Fediverse today
… in terms of the locus of control
… I think there is a structure/topology that is common on the Indieweb, in which a single implementation supports a single user, on a single domain

<capjamesg> Good assessment re: IndieWeb.

eprodrom: all under control of that single user. On the Fediverse today (so, Mastodon, Pleroma, Firefish, etc), we have a different common topology

<tantek> overlap with examples like WordPress where an "instance" can have one or a few users

eprodrom: which is - hundreds/thousands/more of users, who use a single domain, they have a weak affinity of that domain (picked it from an arbitrary list),
… with a volunteer admin, not a paid service

<capjamesg> Agreed.

eprodrom: this is not universal, but very typical. Because of the low affinity between user and domain service, the requirement of portability (of moving from server to server), it's very common
… so in Mastodon world, it's very common to talk to people who have moved 5-6+ times
… so the needs for portability is extra high

<capjamesg> Agreed.

<capjamesg> Blog posts!

<tantek> +1 eprodrom big distinction in locus of control

<Zakim> sandro, you wanted to suggest the requirement that users be able to easily, privately, non-destructively, test the system against various failures

ohhh man, "backups you don't test are useless" --- +1 !!!

<tantek> "how do you test a social server profile backup?" (paraphrased from sandro)

sandro: a user requirement. backups that you don't test are useless. Need to know that social backups are working, without breaking or undoable effects

<tantek> mirroring++

<Loqi_> mirroring has 1 karma over the last year

sandro: common example is for users on social services to have backup accounts on same service

<tantek> testing++

<Loqi_> testing has 1 karma in this channel over the last year (3 in all channels)

<capjamesg> Mirroring feels complicated to the average user?

sandro: excellent for peace of mind

<tantek> capjamesg not if you call it a "backup account" which lots of IGs have

<tantek> (and they do it manually!)

dmitriz: what additions will we need to add to the core data model


eprodrom: so, I think one mechanism that Sandro alluded to here - having the ability to back up the content that's created. Preferably "hot backups"
… of activities, uploaded files, social graph
… and I think Sandro was suggesting having an "alt" account type of hot backup. The other thing is of course, a static backup on an external storage
… and having the location of that backup travel with activities & files as they go around the network
… so that if the content is inacessible, you can try it at this alternate location

capjamesg: writing blog post about all thought processes on this issue, let's collaborate on proposed additions

<capjamesg> ++

bumblefudge__: we should start FEPs based on use cases, not solutions

<capjamesg> Absolutely.

+1 to use case first dev

<capjamesg> We should be use-case drvien.

<capjamesg> *driven

<Loqi_> yea!

eprodrom: with 5 mins left in the session, I'd like to make a possibly controversial proposal -- just as we created a task force around Testing, would it make sense to take on this fairly large chunk of functionality, around Data Portability, as a task force of the Social CG?

capjamesg: good to have a lead spoc for data portability

<capjamesg> I'd start with a prior art review too.

<capjamesg> Blog archive format, WP format, others of which we may not be aware but are used (Squarespace maybe etc.)?

tantek: excellent focus area for attracting group attention; broaden to include identity/account, data/posts, social graph

@capjamesg -- agreed, yeah

<Loqi_> @capjamesg has 0 karma in this channel over the last year (119 in all channels)

tantek: should handle both of those examples

<tantek> capjamesg++ restore :)

<Loqi_> capjamesg has 1 karma in this channel over the last year (120 in all channels)

<capjamesg> Why thank you :)

tantek: backups should be in scope for data portability

whoops didn't mean to --

@capjamesg ++

eprodrom: next step would be to find a volunteer lead

<capjamesg> A discussion for the mailing list

Zakim: end meeting

Minutes manually created (not a transcript), formatted by scribe.perl version 221 (Fri Jul 21 14:01:30 2023 UTC).


Maybe present: bumblefudge__, sandro, Zakim

All speakers: bumblefudge__, capjamesg, dmitriz, eprodrom, sandro, tantek, Zakim

Active on IRC: bumblefudge__, capjamesg, dmitriz, eprodrom, Loqi_, pchampin, pfefferle, sandro, tantek