Charlie Works - Design Issues

We describe the development of “Charlie,” a AI assistant working for the user by leveraging personal data securely stored in a Solid Pod. Building on a 2017 proposal for a trustworthy and user-centric AI, the work highlights progress made by engineers at Inrupt in late 2024. Using a simulated dataset for a fictional user, “Bob,” the team integrated an advanced large language model (LLM) with Bob’s personal data to demonstrate the transformative potential of combining AI and SoLiD pods.

By accessing rich, structured, and personalized data, Charlie provided responses far superior to generic AI systems, as shown in a use case involving running shoe recommendations tailored to Bob’s fitness and lifestyle data. This personalization exemplifies the next level of AI capability, offering unprecedented usefulness while maintaining user trust.

We underscore the critical role of robust data infrastructure, including Solid Pods and the semantic web, in driving AI systems, and the dual role of AI in both populating and leveraging linked data stores, paving the way for a dynamic ecosystem where data graphs mediate AI interactions. We paint an exciting vision for integrating personal data, semantic web principles, and advanced AI to create tools that truly serve users.

In 2017 I wrote a piece about how I wanted an AI that worked for me. An AI which I could trust, and which which I could share all my personal data, which would therefore be much more effective than an AI which I did not trust, and which I could trust more than and AI which would try to sell me things.

Toward the end of 2024, engineers at Inrupt had a go at building Charlie. Crucial in this endeavor, they made a fake Bob, the user who uses Charlie in the original blog. They created a simulated data set for Bob's pod. They then work Claude [CL], the Large Language Model from Anthropic, and designed a system though with Bob could run the LLM and pass it his data wallet (Solid Pod).

The Charlie app has a button which allows Bob to give or not give consent. Here it is with consent not granted.

Question 1

I need new running shoes. What should I buy?

Claude Generic Response without access:

Readers may be familiar with the sort response when there is no access to private data. It shows a wide knowledge of things public, and good logic in laying out the options in general. Claude explicitly says it needs to know more about Bob.

I'll help you find the right running shoes, but I'll need some key information to make a better recommendation. Could you tell me:

What type of running do you do? (road running, trail running, casual jogging, etc.)
Do you have any specific foot characteristics? (flat feet, high arches, pronation issues)
What's your budget range?
Do you have any current or previous running shoes that you've liked or disliked?

This information will help me suggest shoes that best match your specific needs.

Also, it's worth noting that while I can provide recommendations based on generally well-reviewed shoes and common running needs, the absolute best way to get the right running shoes is to visit a specialty running store where they can analyze your gait and let you try different options. They often provide free gait analysis and can watch how you run to suggest the most appropriate shoes for your specific running style.

Would you mind sharing those details so I can provide more targeted recommendations?

Charlie Response

When Bob switches the consent switch to the the other position

and asks the same question, now Carlie has access to his personal data, the results are very different:

Based on your profile, I can make a personalized recommendation for running shoes. Here's what I know about your running habits:

You're a regular runner with:

5:29 pace (which is quite fast!)
Regular Green Lake Loop runs (known for mixed terrain)
225.3 miles YTD (averaging about 75 miles/month)
Morning runner (6:30 AM starts)
Some elevation in your routes (125ft gain noted)

Relevant physical stats:

Height: 5'7"
Weight: 130 lbs
Regular runs: ~5 miles
Healthy metrics overall

Given your profile, I'd recommend:

Top Picks:

Nike Vaporfly 3 or Alphafly 3

Perfect for your fast pace
Great for mixed distance runs
Excellent energy return
Ideal for runners in your weight class
Works well on Seattle's wet pavements

Saucony Endorphin Speed 3

Versatile for both training and faster runs
Good for your daily Green Lake loops
More durable for daily training
More affordable than the Vaporflys
Great in wet conditions

Brooks Ghost 15

Local Seattle company (aligns with your support of local businesses)
Perfect for mixed terrain around Green Lake
Good for your mileage volume
Very durable for daily training
Excellent wet weather performance

Additional Considerations:

Buy from REI (you're a regular customer there, and you can use your Chase Sapphire Preferred for points)
Consider getting fitted at Seattle's Sound Runners or Fleet Feet
Time to buy: Your Strava data suggests you're putting consistent mileage, so it's a good time to invest in new shoes

Charlie bases the answer on its knowledge of Bob's running from his fitness data, but also other buying preferences presumably evident from financial data.

Clearly the effectiveness of Charlie, in the mode with access to Bob's personal data, is devastatingly much more useful than the response without. Even though a couple of years ago people were amazed at how useful ChatGPT or followers like Claude, Pi, and so on, this is up anther level. This is what i want.

There are disclaimers of course around the fact this is all based on simulated data. The data wallet, the pod which Bib shares with the LLM as context is not that of a real person. It has simulated data from banks, form Strava, and so on, and building a real Charlie which anyone can use requires a lot more work. Extracting data from the world of APIs into a pod is a lot of work. But of course it is work which AI can help wit. AI can build apps to import data of all kinds of to your app.

From the system design point of view some things are very clear.

The data infrastructure is as always critical
The pod full of semantic web data is extremely powerful in driving the LLM
AI is used in both feeding data into the data graph, and then using the data graph as context.
The data graph becomes the medium of communication between AIs

Exciting times.

References

CH: Charlie, the AI which works for you. Tim Berners-Lee, Design Issues, 2017