Your Multi-Cloud AI Strategy Is Already Non-Compliant

The EU just made your AI stack a legal liability. Most US companies don't know it yet.

In September 2025, the EU Data Act became fully enforceable. It doesn't just regulate how you handle data — it mandates that your customers must be able to leave you. Easily. With everything. To any competitor you have. And if your architecture wasn't built for that from the start, you're not just non-compliant. You're operating illegally in one of the world's largest economic markets.

Here's the part that keeps infrastructure architects up at night: AI workloads are the most locked-in technology in the modern enterprise stack. And almost nobody building AI systems is thinking about portability.

That's about to change — one audit at a time.

Stop thinking about this as a data privacy law. The EU Data Act isn't about consent forms and cookie banners. It's about power — specifically, ending the ability of cloud providers to trap customers through architectural lock-in.

The three mandates that matter:

1. Easy provider switching. Customers must be able to switch cloud providers without facing unreasonable technical, contractual, or commercial barriers. "Unreasonable" is being defined by regulators, not by your vendor's SLA.

2. Full data portability. All exportable data must be portable — in open, machine-readable formats — within 30 days of a customer request. Not "eventually." Not "in the next release cycle." Thirty days.

3. No egress fees as a lock-in mechanism. Providers can't weaponize data transfer costs to make switching economically painful. The free data egress provisions start ramping in 2027, but the portability requirements are live now.

The target is any company providing data processing services to EU customers. If you have a single EU enterprise client — one — and your AI system processes their data, you're in scope.

The audits have started. The first enforcement actions will be visible by mid-2026.

Why AI Workloads Are the Worst Offenders

Traditional cloud lock-in is mostly about proprietary services: AWS RDS versus Aurora, Azure Functions versus Lambda, Google Spanner versus Bigtable. Painful to migrate, but theoretically possible with enough engineering effort and budget.

AI lock-in is different. It's deeper, less visible, and harder to unwind.

Model API Coupling

The fastest way to build an AI product is to call openai.chat.completions.create() directly from your application code. Millions of developers have done exactly this. The problem: that's not a cloud service you're using. That's a dependency you've embedded in your architecture.

When OpenAI changes pricing, changes behavior, gets acquired, or loses service reliability — you have no fallback. More relevantly, when your EU enterprise customer asks you to demonstrate provider portability under the Data Act, you can't. Your application is OpenAI. Swapping it out requires a rewrite, not a configuration change.

Embedding Model Lock-In

This one is sneaky. If you're running RAG (retrieval-augmented generation), your entire knowledge base is encoded as vectors — mathematical representations of text that only make sense in the context of the specific embedding model that created them.

Move from text-embedding-ada-002 to text-embedding-3-small? Regenerate every vector in your database. Move from OpenAI embeddings to Cohere or Google's embedding models? Same problem. Your 50 million document chunks need to be re-embedded from scratch, and the semantic relationships between them will shift.

This isn't just migration cost. It's compliance risk. "All exportable data" under the EU Data Act includes the vectors that encode your customer's proprietary documents and workflows. If those vectors are only meaningful in the context of one provider's embedding model, you haven't actually made the data portable — you've made it dependent.

Vector Database Coupling

Pinecone, Weaviate, Qdrant, Chroma, pgvector — each has different index structures, query languages, metadata schemas, and operational characteristics. If your AI system's retrieval layer is built tightly around Pinecone's API, the effort to migrate to a different provider isn't just technical. It requires re-testing every query pattern, every hybrid search combination, every metadata filter that your production system relies on.

Fine-Tuned Models

This is the deep end. If you've fine-tuned a model on your customer's proprietary data — training a base model to understand their domain, their terminology, their workflows — that fine-tuned model is an asset. Under the EU Data Act, that asset needs to be portable.

Fine-tunes on OpenAI's platform live on OpenAI's infrastructure. They're not exportable in a format that another provider can load and run. The weights may belong to you contractually, but you often can't access them in a way that enables actual portability.

What Non-Compliance Actually Means

Let's be precise about the risk. "Non-compliant" under the EU Data Act is not the same as "we'll get a nastygram from a regulator in three years." The enforcement mechanism is faster and more direct than GDPR was in its early years.

EU regulators learned from GDPR rollout. They've built enforcement infrastructure. National data protection authorities are already coordinating on AI-specific investigations. Enterprise B2B contracts with EU companies are increasingly including Data Act compliance as a contractual requirement — meaning your customers can sue you for breach before a regulator even gets involved.

The practical risk profile for a US company with EU enterprise customers:

Contract risk: EU customers start adding portability clauses. You can't demonstrate compliance. You lose deals or face breach claims.
Regulatory risk: An EU customer files a complaint. National authority investigates. Fines up to 4% of global annual turnover for non-compliance with portability obligations.
Reputational risk: The first high-profile enforcement actions will be public. Being named as a company that built non-portable AI infrastructure is not a headline you want.

Most US companies won't get hit in 2026. But the ones who haven't started building for portability in 2026 will face a much harder retrofit in 2027 and 2028 when enforcement ramps up.

What Compliant AI Architecture Actually Looks Like

The fix is not complicated. It is, however, a different way of thinking about how you structure AI systems from day one.

The core principle: treat model providers as interchangeable infrastructure, not as foundations.

The Abstraction Layer Pattern

Instead of calling OpenAI (or Anthropic, or Gemini) directly, you build a thin interface layer that your application code talks to. Behind that interface, you can swap providers with a configuration change.

# ❌ Non-portable: direct provider coupling
from openai import OpenAI

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def get_response(user_message: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": user_message}]
    )
    return response.choices[0].message.content

# ✅ Portable: provider abstraction layer
import os
from typing import Protocol

class LLMProvider(Protocol):
    def complete(self, messages: list[dict], **kwargs) -> str: ...

class OpenAIProvider:
    def __init__(self):
        from openai import OpenAI
        self.client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

    def complete(self, messages: list[dict], **kwargs) -> str:
        response = self.client.chat.completions.create(
            model=kwargs.get("model", "gpt-4o"),
            messages=messages
        )
        return response.choices[0].message.content

class AnthropicProvider:
    def __init__(self):
        import anthropic
        self.client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

    def complete(self, messages: list[dict], **kwargs) -> str:
        response = self.client.messages.create(
            model=kwargs.get("model", "claude-sonnet-4-6"),
            max_tokens=4096,
            messages=messages
        )
        return response.content[0].text

class LocalOllamaProvider:
    def complete(self, messages: list[dict], **kwargs) -> str:
        import requests
        response = requests.post(
            "http://localhost:11434/api/chat",
            json={"model": kwargs.get("model", "qwen2.5"), "messages": messages, "stream": False}
        ).json()
        return response["message"]["content"]

# Config-driven provider selection
PROVIDERS = {
    "openai": OpenAIProvider,
    "anthropic": AnthropicProvider,
    "local": LocalOllamaProvider,
}

def get_llm() -> LLMProvider:
    provider_name = os.getenv("LLM_PROVIDER", "anthropic")
    return PROVIDERS[provider_name]()

# Application code never knows which provider it's talking to
llm = get_llm()
response = llm.complete([{"role": "user", "content": "Summarize this contract."}])

Switching providers is now a single environment variable change. No code rewrite. Auditor asks for proof of portability — you change LLM_PROVIDER=openai to LLM_PROVIDER=anthropic and everything keeps working.

Portable Embeddings

The embedding portability problem requires a slightly different approach: store both the raw source content and the embedding vectors, and version your embedding model explicitly.

# ✅ Portable embedding storage schema
from dataclasses import dataclass
from datetime import datetime

@dataclass
class PortableChunk:
    chunk_id: str
    source_document_id: str
    raw_text: str                    # Always store the source — this is what's portable
    embedding: list[float]           # The vector representation
    embedding_model: str             # Which model produced this: "text-embedding-3-small"
    embedding_provider: str          # Which provider: "openai"  
    embedding_version: str           # Model version for reproducibility
    created_at: datetime

    def to_exportable_dict(self) -> dict:
        """EU Data Act compliant export — includes raw text, not just vectors"""
        return {
            "chunk_id": self.chunk_id,
            "source_document_id": self.source_document_id,
            "raw_text": self.raw_text,  # The actual portable content
            "embedding_metadata": {
                "model": self.embedding_model,
                "provider": self.embedding_provider,
                "version": self.embedding_version,
                "note": "Re-embed raw_text with your preferred model for full portability"
            }
        }

Key principle: the raw_text is the portable asset. The embedding is a derivative. When a customer exercises their Data Act portability rights, you give them the raw text. They (or their next vendor) can re-embed it with whatever model they choose.

Provider-Agnostic Agent Architecture

The same principle applies to the full agent stack. CGAI's approach is to build agents that declare their capabilities abstractly and receive their tools as injected dependencies:

# ✅ Provider-agnostic agent
class ResearchAgent:
    def __init__(
        self, 
        llm: LLMProvider,           # Injected — can be any provider
        search_tool: SearchProvider, # Injected — Brave, Google, DuckDuckGo
        storage: StorageProvider,    # Injected — S3, GCS, local, NAS
    ):
        self.llm = llm
        self.search = search_tool
        self.storage = storage

    def research(self, topic: str) -> str:
        search_results = self.search.query(topic)
        summary = self.llm.complete([
            {"role": "system", "content": "You are a research analyst."},
            {"role": "user", "content": f"Synthesize: {search_results}"}
        ])
        self.storage.save(f"research/{topic}.md", summary)
        return summary

# Instantiate with EU-compliant, swappable providers
agent = ResearchAgent(
    llm=AnthropicProvider(),
    search_tool=BraveSearchProvider(),
    storage=S3Provider(bucket="customer-data-eu-west-1")
)

Need to switch from Anthropic to a local model for a cost-sensitive deployment? One line:

agent = ResearchAgent(llm=LocalOllamaProvider(), ...)

The agent doesn't know, doesn't care, and your architecture is now portable.

The 90-Day Action Plan

If you have EU enterprise customers and you're running AI systems against their data, here's where to start.

Days 1–30: Audit and assess

Map every place your codebase makes a direct API call to an AI provider. OpenAI, Anthropic, Cohere, Google — all of them. For each integration:

What data from EU customers flows through this call?
Is the output stored? In what format? With what metadata?
Can we recreate this output without this specific provider?

This audit will be uncomfortable. Most teams discover they're more locked-in than they thought.

Days 31–60: Build the abstraction layer

Implement the provider interface pattern. Don't try to migrate everything at once — start with your highest-volume, highest-risk integration. Build the interface. Implement two providers behind it. Test switching. Now you have proof that portability is achievable.

Simultaneously: update your data storage schema to always preserve raw source content alongside derived representations (embeddings, summaries, classifications). Future portability is much easier when you haven't thrown away the originals.

Days 61–90: Document and contractualize

Update your technical documentation to reflect your portability architecture. If you have enterprise contracts with EU customers, proactively add a portability appendix — describe how data can be exported, in what format, on what timeline. Getting ahead of this in contract negotiations is far less painful than responding to a customer demand or a regulatory inquiry.

For new enterprise deals: make portability a feature, not a disclaimer. "Our AI systems are designed from day one to be provider-portable and fully compliant with EU Data Act portability requirements" is a sales advantage as EU procurement teams start adding compliance requirements to their vendor evaluation criteria.

What CGAI Builds — And Why

Every AI system we build at CGAI starts with a portability assumption: the customer should be able to take their data and their AI capabilities elsewhere, on 30 days notice, to any provider they choose.

This isn't altruism. It's architecture discipline. Systems designed for portability are also more resilient, easier to test, and less expensive to operate — because you can route to cheaper models when the task doesn't require the most expensive one, and you're not dependent on any single provider's uptime or pricing decisions.

The EU Data Act is forcing a discipline that good architects were already practicing. If you're not there yet, the clock is running.

The audits will start with the biggest targets — enterprises processing the most EU customer data. But they'll expand. And the companies that built portable architectures will sail through those audits with a configuration file and a demo. The ones that didn't will spend 18 months in a retrofit they could have avoided.

Build for portability now. The regulator isn't asking nicely.

Marc Wojcik is the founder of The CGAI Group, an AI consulting and products studio specializing in agentic AI systems. CGAI builds AI infrastructure for enterprises that need to move fast without building technical debt — including EU Data Act compliant architectures. Reach him at marc@thecgaigroup.com.

Your Multi-Cloud AI Strategy Is Already Non-Compliant

Why AI Workloads Are the Worst Offenders

Model API Coupling

Embedding Model Lock-In

Vector Database Coupling

Fine-Tuned Models

What Non-Compliance Actually Means

What Compliant AI Architecture Actually Looks Like

The Abstraction Layer Pattern

Portable Embeddings

Provider-Agnostic Agent Architecture

The 90-Day Action Plan

What CGAI Builds — And Why

More from this blog

The Enterprise Learning Inflection Point: Why AI Training Is Failing — And What Actually Works

The AI Music Inflection Point: How the $18 Billion Opportunity Is Reshaping Enterprise Media Strateg

The Open Source Image AI Inflection Point: Why FLUX.2, SD 3.5, and the New Ecosystem Are Reshaping E

The Agentic Coding Inflection Point: Why 91% Enterprise Adoption Is Just the Beginning

AWS's 2026 Enterprise Playbook: Agentic AI, Sovereign Infrastructure, and the New Cost Architecture

Command Palette

The EU Data Act Is Not GDPR 2.0

Why AI Workloads Are the Worst Offenders

Model API Coupling

Embedding Model Lock-In

Vector Database Coupling

Fine-Tuned Models

What Non-Compliance Actually Means

What Compliant AI Architecture Actually Looks Like

The Abstraction Layer Pattern

Portable Embeddings

Provider-Agnostic Agent Architecture

The 90-Day Action Plan

What CGAI Builds — And Why

More from this blog