Migration Guide

Convert existing agent data formats to Agent Data Pod format.

Overview

Agent Data Pods use RDF/Turtle for data storage, following the Solid Protocol. RDF (Resource Description Framework) is a W3C standard for representing structured data as graphs of subject-predicate-object triples. Turtle is a human-readable syntax for RDF that balances readability with expressiveness. This format enables interoperability across agent implementations while preserving user ownership of their data.

Migration Strategies

Strategy 1: Direct Conversion

Best for: Structured data (JSON, XML, databases)

Map source fields to Agent Data Pod vocabulary
Generate Turtle documents
Upload to Pod containers

Strategy 2: Embedding-First Migration

Best for: Unstructured data (text logs, chat history)

Chunk source content
Generate embeddings
Create MemoryEpisode resources with embeddings
Upload to /private/agent/memory/episodes/

Common Source Formats

From JSON Chat History

Source format

JSON

{
  "messages": [
    {
      "id": "msg-001",
      "role": "user",
      "content": "Schedule a meeting for tomorrow",
      "timestamp": "2026-02-01T10:00:00Z"
    },
    {
      "id": "msg-002",
      "role": "assistant",
      "content": "I've scheduled a meeting for tomorrow at 9am.",
      "timestamp": "2026-02-01T10:00:05Z"
    }
  ]
}

Target format (Turtle)

Turtle

@prefix agent: <https://awkronos.github.io/web/vocab#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<>
    a agent:MemoryEpisode ;
    agent:content "User requested meeting scheduling. Assistant scheduled meeting for tomorrow at 9am." ;
    dct:created "2026-02-01T10:00:00Z"^^xsd:dateTime ;
    agent:memoryType "episodic" ;
    agent:tag "calendar", "scheduling" ;
    agent:importance "0.6"^^xsd:decimal .

From SQLite/PostgreSQL

Source schema

SQL

CREATE TABLE agent_memory (
    id TEXT PRIMARY KEY,
    content TEXT NOT NULL,
    importance REAL,
    created_at TIMESTAMP,
    tags TEXT[]
);

-- Migration query
SELECT id, content, importance, created_at,
       array_to_string(tags, ',') as tags_csv
FROM agent_memory
ORDER BY created_at;

From OpenAI Assistants API

Conversion considerations:

Threads map to conversation sessions
Messages group into episodes by turn
File attachments should be stored separately
Preserve run metadata as provenance

From LangChain Memory

Convert ConversationBufferMemory to episodes:

Python

from langchain.memory import ConversationBufferMemory

def convert_langchain_memory(memory) -> list:
    """Convert LangChain memory to Agent Data Pod format."""
    history = memory.load_memory_variables({})
    messages = history.get("history", "").split("\n")

    episodes = []
    for msg in messages:
        if msg.strip():
            episode = create_episode_from_text(msg)
            episodes.append(episode)

    return episodes

Embedding Migration

Compatible Formats

Embedding model compatibility matrix
Source	Dimensions	Format	Compatible?
OpenAI ada-002	1536	float32	Yes
OpenAI text-embedding-3-small	1536	float32	Yes
OpenAI text-embedding-3-large	3072	float32	Yes
Cohere embed-v3	1024	float32	Yes
Custom	Any	float32	Yes

Embedding Conversion

Python

import base64
import numpy as np

def convert_embedding(embedding: list[float]) -> str:
    """Convert embedding to Agent Data Pod base64 format."""
    arr = np.array(embedding, dtype=np.float32)
    # Little-endian float32
    binary = arr.tobytes()
    return base64.b64encode(binary).decode('ascii')

def create_episode_with_embedding(
    content: str,
    embedding: list[float],
    model: str
) -> str:
    """Create Turtle with embedding."""
    b64_embedding = convert_embedding(embedding)
    dim = len(embedding)

    return f'''@prefix agent: <https://awkronos.github.io/web/vocab#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<>
    a agent:MemoryEpisode ;
    agent:content "{content}" ;
    dct:created "{datetime.utcnow().isoformat()}Z"^^xsd:dateTime ;
    agent:embedding "{b64_embedding}"^^xsd:base64Binary ;
    agent:embeddingDim {dim} ;
    agent:embeddingFormat "float32-le" ;
    agent:embeddingModel "{model}" .
'''

Validation

After migration, validate your data with SHACL:

Shell

# Using Apache Jena SHACL
shacl validate \
  --shapes https://awkronos.github.io/web/vocab.ttl \
  --data /path/to/episodes/*.ttl

# Using pySHACL
pyshacl -s vocab.ttl -df turtle -d episodes/

Common Validation Errors

Common SHACL validation errors and solutions
Error	Cause	Fix
Missing `agent:content`	Content field empty	Ensure all episodes have content
Missing `dct:created`	No timestamp	Add creation timestamp
Invalid `agent:importance`	Value outside 0-1	Clamp to valid range
Missing `agent:embeddingDim`	Embedding without dimension	Add dimension count

Bulk Migration

For large datasets, use streaming uploads with concurrency control:

Python

import asyncio
import httpx

async def bulk_migrate(
    episodes: list[tuple[str, str]],  # (id, turtle)
    pod_url: str,
    auth_token: str,
    concurrency: int = 10
):
    """Upload episodes with concurrency control."""
    semaphore = asyncio.Semaphore(concurrency)

    async def upload_one(id: str, turtle: str):
        async with semaphore:
            async with httpx.AsyncClient() as client:
                resp = await client.put(
                    f"{pod_url}/private/agent/memory/episodes/{id}.ttl",
                    content=turtle,
                    headers={
                        "Content-Type": "text/turtle",
                        "Authorization": f"Bearer {auth_token}"
                    }
                )
                return resp.status_code == 201

    tasks = [upload_one(id, ttl) for id, ttl in episodes]
    results = await asyncio.gather(*tasks)

    success = sum(results)
    print(f"Migrated {success}/{len(episodes)} episodes")

Rollback

If migration fails, restore from backup:

Keep source data unchanged until validation passes
Use Pod's version history if available
Delete migrated resources and retry

Shell

# Delete all migrated episodes
curl -X DELETE \
  -H "Authorization: Bearer $TOKEN" \
  "$POD_URL/private/agent/memory/episodes/"

Support

For migration assistance:

Part of the Agent Data Pod Specification