Migration Guide
Convert existing agent data formats to Agent Data Pod format.
Overview
Agent Data Pods use RDF/Turtle for data storage, following the Solid Protocol. RDF (Resource Description Framework) is a W3C standard for representing structured data as graphs of subject-predicate-object triples. Turtle is a human-readable syntax for RDF that balances readability with expressiveness. This format enables interoperability across agent implementations while preserving user ownership of their data.
Migration Strategies
Strategy 1: Direct Conversion
Best for: Structured data (JSON, XML, databases)
- Map source fields to Agent Data Pod vocabulary
- Generate Turtle documents
- Upload to Pod containers
Strategy 2: Embedding-First Migration
Best for: Unstructured data (text logs, chat history)
- Chunk source content
- Generate embeddings
- Create MemoryEpisode resources with embeddings
- Upload to
/private/agent/memory/episodes/
Common Source Formats
From JSON Chat History
Source format
{
"messages": [
{
"id": "msg-001",
"role": "user",
"content": "Schedule a meeting for tomorrow",
"timestamp": "2026-02-01T10:00:00Z"
},
{
"id": "msg-002",
"role": "assistant",
"content": "I've scheduled a meeting for tomorrow at 9am.",
"timestamp": "2026-02-01T10:00:05Z"
}
]
}
Target format (Turtle)
@prefix agent: <https://awkronos.github.io/web/vocab#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<>
a agent:MemoryEpisode ;
agent:content "User requested meeting scheduling. Assistant scheduled meeting for tomorrow at 9am." ;
dct:created "2026-02-01T10:00:00Z"^^xsd:dateTime ;
agent:memoryType "episodic" ;
agent:tag "calendar", "scheduling" ;
agent:importance "0.6"^^xsd:decimal .
From SQLite/PostgreSQL
Source schema
CREATE TABLE agent_memory (
id TEXT PRIMARY KEY,
content TEXT NOT NULL,
importance REAL,
created_at TIMESTAMP,
tags TEXT[]
);
-- Migration query
SELECT id, content, importance, created_at,
array_to_string(tags, ',') as tags_csv
FROM agent_memory
ORDER BY created_at;
From OpenAI Assistants API
Conversion considerations:
- Threads map to conversation sessions
- Messages group into episodes by turn
- File attachments should be stored separately
- Preserve run metadata as provenance
From LangChain Memory
Convert ConversationBufferMemory to episodes:
from langchain.memory import ConversationBufferMemory
def convert_langchain_memory(memory) -> list:
"""Convert LangChain memory to Agent Data Pod format."""
history = memory.load_memory_variables({})
messages = history.get("history", "").split("\n")
episodes = []
for msg in messages:
if msg.strip():
episode = create_episode_from_text(msg)
episodes.append(episode)
return episodes
Embedding Migration
Compatible Formats
| Source | Dimensions | Format | Compatible? |
|---|---|---|---|
| OpenAI ada-002 | 1536 | float32 | Yes |
| OpenAI text-embedding-3-small | 1536 | float32 | Yes |
| OpenAI text-embedding-3-large | 3072 | float32 | Yes |
| Cohere embed-v3 | 1024 | float32 | Yes |
| Custom | Any | float32 | Yes |
Embedding Conversion
import base64
import numpy as np
def convert_embedding(embedding: list[float]) -> str:
"""Convert embedding to Agent Data Pod base64 format."""
arr = np.array(embedding, dtype=np.float32)
# Little-endian float32
binary = arr.tobytes()
return base64.b64encode(binary).decode('ascii')
def create_episode_with_embedding(
content: str,
embedding: list[float],
model: str
) -> str:
"""Create Turtle with embedding."""
b64_embedding = convert_embedding(embedding)
dim = len(embedding)
return f'''@prefix agent: <https://awkronos.github.io/web/vocab#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<>
a agent:MemoryEpisode ;
agent:content "{content}" ;
dct:created "{datetime.utcnow().isoformat()}Z"^^xsd:dateTime ;
agent:embedding "{b64_embedding}"^^xsd:base64Binary ;
agent:embeddingDim {dim} ;
agent:embeddingFormat "float32-le" ;
agent:embeddingModel "{model}" .
'''
Validation
After migration, validate your data with SHACL:
# Using Apache Jena SHACL
shacl validate \
--shapes https://awkronos.github.io/web/vocab.ttl \
--data /path/to/episodes/*.ttl
# Using pySHACL
pyshacl -s vocab.ttl -df turtle -d episodes/
Common Validation Errors
| Error | Cause | Fix |
|---|---|---|
Missing agent:content | Content field empty | Ensure all episodes have content |
Missing dct:created | No timestamp | Add creation timestamp |
Invalid agent:importance | Value outside 0-1 | Clamp to valid range |
Missing agent:embeddingDim | Embedding without dimension | Add dimension count |
Bulk Migration
For large datasets, use streaming uploads with concurrency control:
import asyncio
import httpx
async def bulk_migrate(
episodes: list[tuple[str, str]], # (id, turtle)
pod_url: str,
auth_token: str,
concurrency: int = 10
):
"""Upload episodes with concurrency control."""
semaphore = asyncio.Semaphore(concurrency)
async def upload_one(id: str, turtle: str):
async with semaphore:
async with httpx.AsyncClient() as client:
resp = await client.put(
f"{pod_url}/private/agent/memory/episodes/{id}.ttl",
content=turtle,
headers={
"Content-Type": "text/turtle",
"Authorization": f"Bearer {auth_token}"
}
)
return resp.status_code == 201
tasks = [upload_one(id, ttl) for id, ttl in episodes]
results = await asyncio.gather(*tasks)
success = sum(results)
print(f"Migrated {success}/{len(episodes)} episodes")
Rollback
If migration fails, restore from backup:
- Keep source data unchanged until validation passes
- Use Pod's version history if available
- Delete migrated resources and retry
# Delete all migrated episodes
curl -X DELETE \
-H "Authorization: Bearer $TOKEN" \
"$POD_URL/private/agent/memory/episodes/"
Support
For migration assistance:
Part of the Agent Data Pod Specification