Migration Guide

Convert existing agent data formats to Agent Data Pod format.


Overview

Agent Data Pods use RDF/Turtle for data storage, following the Solid Protocol. RDF (Resource Description Framework) is a W3C standard for representing structured data as graphs of subject-predicate-object triples. Turtle is a human-readable syntax for RDF that balances readability with expressiveness. This format enables interoperability across agent implementations while preserving user ownership of their data.


Migration Strategies

Strategy 1: Direct Conversion

Best for: Structured data (JSON, XML, databases)

  1. Map source fields to Agent Data Pod vocabulary
  2. Generate Turtle documents
  3. Upload to Pod containers

Strategy 2: Embedding-First Migration

Best for: Unstructured data (text logs, chat history)

  1. Chunk source content
  2. Generate embeddings
  3. Create MemoryEpisode resources with embeddings
  4. Upload to /private/agent/memory/episodes/

Common Source Formats

From JSON Chat History

Source format

JSON
{
  "messages": [
    {
      "id": "msg-001",
      "role": "user",
      "content": "Schedule a meeting for tomorrow",
      "timestamp": "2026-02-01T10:00:00Z"
    },
    {
      "id": "msg-002",
      "role": "assistant",
      "content": "I've scheduled a meeting for tomorrow at 9am.",
      "timestamp": "2026-02-01T10:00:05Z"
    }
  ]
}

Target format (Turtle)

Turtle
@prefix agent: <https://awkronos.github.io/web/vocab#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<>
    a agent:MemoryEpisode ;
    agent:content "User requested meeting scheduling. Assistant scheduled meeting for tomorrow at 9am." ;
    dct:created "2026-02-01T10:00:00Z"^^xsd:dateTime ;
    agent:memoryType "episodic" ;
    agent:tag "calendar", "scheduling" ;
    agent:importance "0.6"^^xsd:decimal .

From SQLite/PostgreSQL

Source schema

SQL
CREATE TABLE agent_memory (
    id TEXT PRIMARY KEY,
    content TEXT NOT NULL,
    importance REAL,
    created_at TIMESTAMP,
    tags TEXT[]
);

-- Migration query
SELECT id, content, importance, created_at,
       array_to_string(tags, ',') as tags_csv
FROM agent_memory
ORDER BY created_at;

From OpenAI Assistants API

Conversion considerations:

From LangChain Memory

Convert ConversationBufferMemory to episodes:

Python
from langchain.memory import ConversationBufferMemory

def convert_langchain_memory(memory) -> list:
    """Convert LangChain memory to Agent Data Pod format."""
    history = memory.load_memory_variables({})
    messages = history.get("history", "").split("\n")

    episodes = []
    for msg in messages:
        if msg.strip():
            episode = create_episode_from_text(msg)
            episodes.append(episode)

    return episodes

Embedding Migration

Compatible Formats

Embedding model compatibility matrix
SourceDimensionsFormatCompatible?
OpenAI ada-0021536float32Yes
OpenAI text-embedding-3-small1536float32Yes
OpenAI text-embedding-3-large3072float32Yes
Cohere embed-v31024float32Yes
CustomAnyfloat32Yes

Embedding Conversion

Python
import base64
import numpy as np

def convert_embedding(embedding: list[float]) -> str:
    """Convert embedding to Agent Data Pod base64 format."""
    arr = np.array(embedding, dtype=np.float32)
    # Little-endian float32
    binary = arr.tobytes()
    return base64.b64encode(binary).decode('ascii')

def create_episode_with_embedding(
    content: str,
    embedding: list[float],
    model: str
) -> str:
    """Create Turtle with embedding."""
    b64_embedding = convert_embedding(embedding)
    dim = len(embedding)

    return f'''@prefix agent: <https://awkronos.github.io/web/vocab#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<>
    a agent:MemoryEpisode ;
    agent:content "{content}" ;
    dct:created "{datetime.utcnow().isoformat()}Z"^^xsd:dateTime ;
    agent:embedding "{b64_embedding}"^^xsd:base64Binary ;
    agent:embeddingDim {dim} ;
    agent:embeddingFormat "float32-le" ;
    agent:embeddingModel "{model}" .
'''

Validation

After migration, validate your data with SHACL:

Shell
# Using Apache Jena SHACL
shacl validate \
  --shapes https://awkronos.github.io/web/vocab.ttl \
  --data /path/to/episodes/*.ttl

# Using pySHACL
pyshacl -s vocab.ttl -df turtle -d episodes/

Common Validation Errors

Common SHACL validation errors and solutions
ErrorCauseFix
Missing agent:contentContent field emptyEnsure all episodes have content
Missing dct:createdNo timestampAdd creation timestamp
Invalid agent:importanceValue outside 0-1Clamp to valid range
Missing agent:embeddingDimEmbedding without dimensionAdd dimension count

Bulk Migration

For large datasets, use streaming uploads with concurrency control:

Python
import asyncio
import httpx

async def bulk_migrate(
    episodes: list[tuple[str, str]],  # (id, turtle)
    pod_url: str,
    auth_token: str,
    concurrency: int = 10
):
    """Upload episodes with concurrency control."""
    semaphore = asyncio.Semaphore(concurrency)

    async def upload_one(id: str, turtle: str):
        async with semaphore:
            async with httpx.AsyncClient() as client:
                resp = await client.put(
                    f"{pod_url}/private/agent/memory/episodes/{id}.ttl",
                    content=turtle,
                    headers={
                        "Content-Type": "text/turtle",
                        "Authorization": f"Bearer {auth_token}"
                    }
                )
                return resp.status_code == 201

    tasks = [upload_one(id, ttl) for id, ttl in episodes]
    results = await asyncio.gather(*tasks)

    success = sum(results)
    print(f"Migrated {success}/{len(episodes)} episodes")

Rollback

If migration fails, restore from backup:

  1. Keep source data unchanged until validation passes
  2. Use Pod's version history if available
  3. Delete migrated resources and retry
Shell
# Delete all migrated episodes
curl -X DELETE \
  -H "Authorization: Bearer $TOKEN" \
  "$POD_URL/private/agent/memory/episodes/"

Support

For migration assistance:


Part of the Agent Data Pod Specification