How to Add a Pre-Trade Trust Gate to Your Coinbase AgentKit Agent

2026-05-15 · Tutorial agentkit x402 agent-security

When an LLM-driven agent makes an on-chain swap on behalf of a user, the agent fills in the destination, the amount, the slippage tolerance — and then signs. There's no human in the loop reading the transaction back before broadcast. Whatever the agent decided to do, the wallet does.

That's the safety surface. The agent's decision-making is only as good as the inputs it was given. If a malicious instruction enters the prompt context, or a token address comes from a poisoned data source, the agent will faithfully execute against the wrong destination.

The defense is straightforward in shape and surprisingly underused in practice: run a verification check between "agent has formed a plan" and "wallet signs the transaction." This post walks through wiring that check into a Coinbase AgentKit agent on Base using a single ActionProvider, with no changes to the agent loop itself.

The package is @paladinfi/agentkit-actions (npm, source), MIT-licensed, currently at v0.1.1. It exposes one tool to your agent: paladin_trust_check. The LLM decides when to call it — typically before a swap or transfer.

The verification pattern

The pattern itself isn't novel. Any production trading system runs sanity checks before signing — price oracles, slippage bounds, balance precondition checks. The shape AI agents need is a generalization:

The critical word is independent. If the check is performed by the same source that produced the suspicious instruction, it's not a check — it's part of the attack surface. The destination of the verification call should be a service the agent could not have been talked into routing the verification to by a malicious prompt. In practice that means: the trust gate's endpoint is hard-coded in the ActionProvider, not parameterized from the agent's input.

@paladinfi/agentkit-actions enforces this by construction — the endpoint, the USDC contract for x402 payment, the treasury address, the payment amount cap, and the EIP-712 domain are all hard-coded into the package's source. A compromised PaladinFi server cannot redirect this action's signed x402 authorization to a different recipient, asset, or chain — the constants are validated client-side before viem signs. (This guarantee scopes to the trust-check action only; supply-chain integrity of the npm package itself is a separate concern — verify the published constants in src/x402/validate.ts and pin the version.)

Installation

npm install @paladinfi/agentkit-actions
# or pnpm add / bun add

Peer dependency: @coinbase/agentkit@0.10.4 (pinned exactly — AgentKit's API is still stabilizing).

Operated by Malcontent Games LLC (Michigan); dev@paladinfi.com for security disclosure. Source MIT-licensed at github.com/paladinfi/agentkit-actions.

If you're migrating from @paladinfi/agentkit-actions@0.0.x, drop the walletClientAccount: ... argument — paid mode now wires through the AgentKit wallet provider's toSigner() automatically. The legacy argument throws with a migration message at boot.

Wiring into AgentKit (preview mode)

Preview mode is free, returns sample-fixture data, and is the right starting point for development. It lets you confirm the tool registers, the LLM uses it appropriately, and the response shape integrates with your agent's reasoning before you wire up payment.

import { AgentKit } from "@coinbase/agentkit";
import { paladinTrustActionProvider } from "@paladinfi/agentkit-actions";
import { getLangChainTools } from "@coinbase/agentkit-langchain";
// or: import { getVercelAITools } from "@coinbase/agentkit-vercel-ai-sdk";

const agentkit = await AgentKit.from({
  walletProvider, // your existing EvmWalletProvider on Base
  actionProviders: [
    paladinTrustActionProvider(), // mode: "preview" by default
    // ...your other providers
  ],
});

const tools = await getLangChainTools(agentkit);

That's the full integration. The agent now has a paladin_trust_check tool. When the LLM decides to call it (e.g., before a swap), it'll hit POST /v1/trust-check/preview against swap.paladinfi.com, no auth, no payment.

Preview responses are explicitly marked as samples: every factor has real: false, and the recommendation is sample- prefixed (sample-allow / sample-warn / sample-block). This prevents a preview-mode screenshot from being cropped into a misleading "real" assessment downstream. You can develop and demo against preview indefinitely; only flip to paid when production calls need to drive real decisions.

What the agent sees

When the LLM calls paladin_trust_check with a token address, it gets a structured response. Here's the actual preview-mode shape (live response from swap.paladinfi.com/v1/trust-check/preview against Base USDC):

{
  "address": "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913",
  "chainId": 8453,
  "taker": null,
  "request_id": "b7795c5e-ae26-4e44-b78b-750ba8767435",
  "trust": {
    "risk_score": null,
    "risk_score_scale": "0-100 (lower = safer); null on preview because no live evaluation runs",
    "recommendation": "sample-allow",
    "recommendation_enum": ["allow", "warn", "block"],
    "factors": [
      { "source": "ofac",             "signal": "not_listed", "details": "Live on paid endpoint; not evaluated on preview", "real": false },
      { "source": "etherscan_source", "signal": "verified",   "details": "SAMPLE — illustrative only",                       "real": false },
      { "source": "goplus",           "signal": "ok",         "details": "SAMPLE — illustrative only",                       "real": false },
      { "source": "anomaly",          "signal": "ok",         "details": "SAMPLE — illustrative only",                       "real": false }
    ],
    "version": "1.1",
    "_preview": true
  }
}

In paid mode (the production response from /v1/trust-check), the shape is identical, with three differences: recommendation is one of allow / warn / block (no sample- prefix), each factor's real field is true for successfully evaluated sources, and risk_score is a number on the 0-100 scale.

recommendation is the actionable summary. The intended pattern: agent proceeds on allow, surfaces a warning to the user on warn, abstains entirely on block. The factors array lets the agent (or you, in a system prompt) reason about why — useful when you want to log decisions, route warns through human-confirmation, or weight specific factors differently for your use case.

A few specific behaviors worth knowing:

What gets checked

The trust gate is composed of four independent signals, each evaluated by the paid endpoint:

source value Upstream What it catches
ofac U.S. Treasury SDN XML feed (cryptocurrency-tagged via Feature 345 / Detail 1432) Sanctioned-address swaps. Refreshed every 24h from Treasury. Match here forces recommendation: block regardless of other signals.
goplus GoPlus trust-list + token-security API Known-malicious contract patterns: honeypots, blacklist functions, owner-can-mint, transfer-pause, hidden-fee, etc.
etherscan_source Etherscan getSourceCode Unverified-source contracts (a weak but consistent signal — most legitimate tokens are source-verified within days of deploy).
anomaly On-call analysis Contract age windows (under 1h / 24h / 7d), address-kind classification (contract vs. EOA via eth_getCode), and no-outbound transaction history.

This isn't a magic detector for every kind of malicious token. It's a layered gate where each layer catches a different class of failure, and the layered response gives the agent (or operator) enough information to decide the right action.

The signals you won't find here, by design:

This is one of the honest tradeoffs of a pre-trade gate: you can only block what you can detect quickly. Signals that require deep analysis fit a different layer of the stack.

Switching to paid mode

Once you've confirmed the integration works in preview, switch to paid mode for production. The flag is one line:

const agentkit = await AgentKit.from({
  walletProvider,
  actionProviders: [
    paladinTrustActionProvider({ mode: "paid" }),
  ],
});

Paid mode costs $0.001 USDC per call on Base, settled via x402 EIP-3009. The agent's wallet (any EvmWalletProvider — Viem, CDP, Privy, etc.) signs an EIP-3009 authorization; the x402 facilitator submits the on-chain transfer and pays gas. No ETH is required from the agent's wallet — x402 EIP-3009 settlement is gasless from the signer's perspective. Fund the agent's wallet with ~$0.10 USDC (100 calls at the declared price) plus a small buffer for the per-call $0.01 client-side cap (see the pre-sign safety section below).

Paid responses return live evaluations with real: true on successful factors. The schema is otherwise identical to preview — same fields, same recommendation semantics, same fail-closed contract.

The pre-sign safety guarantee

This is the differentiator that justifies the package existing alongside AgentKit's generic x402ActionProvider.

A generic x402 transport lets your agent make HTTP-via-x402 calls to any URL the LLM produces. The wallet signs whatever authorization the server's 402 challenge demands. If the server is compromised, or the URL is rewritten by a prompt-injection attack, the signed authorization could go to an unintended recipient.

@paladinfi/agentkit-actions validates the server's 402 challenge against hard-coded constants before viem signs:

If any field deviates, the call aborts client-side before viem signs, with an error prefixed paladin-trust BLOCKED pre-sign: so operators can grep and alert. Within this action's scope, a compromised PaladinFi server cannot redirect a signed authorization to a different recipient, asset, or chain — payment-path safety holds even if the server is fully adversarial.

A separate question worth answering directly: can a compromised PaladinFi server drain my agent's wallet via the trust verdict itself? The failure mode if our server is compromised is false-allow (returning recommendation: "allow" on a malicious token), not direct fund-loss. The trust-verdict logic is the part where you have to trust our evaluation — which is why the source is public, the response shape is versioned (trust.version), and the architectural mitigation is layered defense: this gate handles "is this destination known-bad"; your agent should also enforce its own spending caps, allowlists for high-value transfers, and slippage tolerances. Defense-in-depth is the rule; this is one layer of it.

The hard-coding is the architectural commitment. If we ever need to change the treasury or the price, it's a package version bump (visible in npm diff and npm-pinging), not a server-side config flip. Operators downstream see the change and accept it explicitly.

You can verify the constants yourself in src/x402/validate.ts — the file is short, well-commented, and the entire pre-sign hook fits on a single screen.

Limitations to know

The honest disclosures matter more than the marketing claims. A few things this package doesn't do, that you should know before relying on it:

Where next

The sister package @paladinfi/eliza-plugin-trust ships the same trust-check semantic for ElizaOS agents. Both share the same security architecture (hard-coded constants + pre-sign hook); a CI drift check enforces byte-for-byte parity on the security-critical files.

Source, demo walkthrough, and the full security model are public:

For agents already running in production: start in preview mode, observe the recommendations for a week against your normal traffic, then flip to paid mode once you trust the layer's behavior. The cost at scale is bounded — $0.001 × your weekly swap volume — and the failure mode is warn-fail-closed, so the system degrades gracefully rather than going dark or silent-allowing.

The architectural pattern matters more than this specific package. Whatever you pick — this one, a competing tool, or your own internal check — pre-trade trust verification is increasingly load-bearing for agents handling user funds, and the architectural pattern (independent verification destination, hard-coded constants, structured recommendation) is the part that generalizes beyond any specific implementation.

Operated by Malcontent Games LLC (Michigan, BF1971980-1), doing business as PaladinFi. Public API at swap.paladinfi.com; operational status at swap.paladinfi.com/health; terms at paladinfi.com/terms. MIT-licensed source.