How to Add a Pre-Trade Trust Gate to Your Coinbase AgentKit Agent

2026-05-15 · Tutorial agentkit x402 agent-security

When an LLM-driven agent makes an on-chain swap on behalf of a user, the agent fills in the destination, the amount, the slippage tolerance — and then signs. There's no human in the loop reading the transaction back before broadcast. Whatever the agent decided to do, the wallet does.

That's the safety surface. The agent's decision-making is only as good as the inputs it was given. If a malicious instruction enters the prompt context, or a token address comes from a poisoned data source, the agent will faithfully execute against the wrong destination.

The defense is straightforward in shape and surprisingly underused in practice: run a verification check between "agent has formed a plan" and "wallet signs the transaction." This post walks through wiring that check into a Coinbase AgentKit agent on Base using a single ActionProvider, with no changes to the agent loop itself.

The package is @paladinfi/agentkit-actions (npm, source), MIT-licensed, currently at v0.1.1. It exposes one tool to your agent: paladin_trust_check. The LLM decides when to call it — typically before a swap or transfer.

The verification pattern

The pattern itself isn't novel. Any production trading system runs sanity checks before signing — price oracles, slippage bounds, balance precondition checks. The shape AI agents need is a generalization:

Agent forms a plan that includes signing a transaction (swap, transfer, contract call).
Before signing, ask: do we have any reason to believe the destination or the asset is compromised?
The check runs against state the agent did not control — independent on-chain data, third-party threat intel, regulatory lists.
Result drives one of three branches: allow (proceed), warn (surface the concern to the user; agent should not auto-execute), block (abstain entirely).

The critical word is independent. If the check is performed by the same source that produced the suspicious instruction, it's not a check — it's part of the attack surface. The destination of the verification call should be a service the agent could not have been talked into routing the verification to by a malicious prompt. In practice that means: the trust gate's endpoint is hard-coded in the ActionProvider, not parameterized from the agent's input.

@paladinfi/agentkit-actions enforces this by construction — the endpoint, the USDC contract for x402 payment, the treasury address, the payment amount cap, and the EIP-712 domain are all hard-coded into the package's source. A compromised PaladinFi server cannot redirect this action's signed x402 authorization to a different recipient, asset, or chain — the constants are validated client-side before viem signs. (This guarantee scopes to the trust-check action only; supply-chain integrity of the npm package itself is a separate concern — verify the published constants in src/x402/validate.ts and pin the version.)

Installation

npm install @paladinfi/agentkit-actions
# or pnpm add / bun add

Peer dependency: @coinbase/agentkit@0.10.4 (pinned exactly — AgentKit's API is still stabilizing).

Operated by Malcontent Games LLC (Michigan); dev@paladinfi.com for security disclosure. Source MIT-licensed at github.com/paladinfi/agentkit-actions.

If you're migrating from @paladinfi/agentkit-actions@0.0.x, drop the walletClientAccount: ... argument — paid mode now wires through the AgentKit wallet provider's toSigner() automatically. The legacy argument throws with a migration message at boot.

Wiring into AgentKit (preview mode)

Preview mode is free, returns sample-fixture data, and is the right starting point for development. It lets you confirm the tool registers, the LLM uses it appropriately, and the response shape integrates with your agent's reasoning before you wire up payment.

import { AgentKit } from "@coinbase/agentkit";
import { paladinTrustActionProvider } from "@paladinfi/agentkit-actions";
import { getLangChainTools } from "@coinbase/agentkit-langchain";
// or: import { getVercelAITools } from "@coinbase/agentkit-vercel-ai-sdk";

const agentkit = await AgentKit.from({
  walletProvider, // your existing EvmWalletProvider on Base
  actionProviders: [
    paladinTrustActionProvider(), // mode: "preview" by default
    // ...your other providers
  ],
});

const tools = await getLangChainTools(agentkit);

That's the full integration. The agent now has a paladin_trust_check tool. When the LLM decides to call it (e.g., before a swap), it'll hit POST /v1/trust-check/preview against swap.paladinfi.com, no auth, no payment.

Preview responses are explicitly marked as samples: every factor has real: false, and the recommendation is sample- prefixed (sample-allow / sample-warn / sample-block). This prevents a preview-mode screenshot from being cropped into a misleading "real" assessment downstream. You can develop and demo against preview indefinitely; only flip to paid when production calls need to drive real decisions.

What the agent sees

When the LLM calls paladin_trust_check with a token address, it gets a structured response. Here's the actual preview-mode shape (live response from swap.paladinfi.com/v1/trust-check/preview against Base USDC):

{
  "address": "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913",
  "chainId": 8453,
  "taker": null,
  "request_id": "b7795c5e-ae26-4e44-b78b-750ba8767435",
  "trust": {
    "risk_score": null,
    "risk_score_scale": "0-100 (lower = safer); null on preview because no live evaluation runs",
    "recommendation": "sample-allow",
    "recommendation_enum": ["allow", "warn", "block"],
    "factors": [
      { "source": "ofac",             "signal": "not_listed", "details": "Live on paid endpoint; not evaluated on preview", "real": false },
      { "source": "etherscan_source", "signal": "verified",   "details": "SAMPLE — illustrative only",                       "real": false },
      { "source": "goplus",           "signal": "ok",         "details": "SAMPLE — illustrative only",                       "real": false },
      { "source": "anomaly",          "signal": "ok",         "details": "SAMPLE — illustrative only",                       "real": false }
    ],
    "version": "1.1",
    "_preview": true
  }
}

In paid mode (the production response from /v1/trust-check), the shape is identical, with three differences: recommendation is one of allow / warn / block (no sample- prefix), each factor's real field is true for successfully evaluated sources, and risk_score is a number on the 0-100 scale.

recommendation is the actionable summary. The intended pattern: agent proceeds on allow, surfaces a warning to the user on warn, abstains entirely on block. The factors array lets the agent (or you, in a system prompt) reason about why — useful when you want to log decisions, route warns through human-confirmation, or weight specific factors differently for your use case.

A few specific behaviors worth knowing:

Fail-closed contract. If all underlying sources are temporarily unreachable on a paid call, the response returns recommendation: "warn" (never silent-allow). Individual unreachable sources show up as factors with real: false and signal: "unreachable". Your agent should treat warn as "do not auto-execute" by default.
OFAC override is absolute (when reachable). If the OFAC SDN check matches and the source is reachable, recommendation is block regardless of other factors. There is no weighting around this. (When the OFAC source itself is unreachable, the fail-closed contract above returns warn, not silent-allow — see the previous bullet.)
Recommendation is the source of truth. Don't try to recompute it from per-factor signals — the server applies overrides (OFAC, fail-closed) and a contract version field (version: "1.1") that signals when the semantics tighten across releases. Reading recommendation directly is the stable integration point.
Preview is explicitly marked. recommendation is prefixed sample-, every factor has real: false, and _preview: true is set on the trust block. This is intentional — a preview-mode screenshot cannot be cropped into looking like a real assessment downstream.
If swap.paladinfi.com is fully unreachable (DNS, TLS, or 5xx), the ActionProvider surfaces a tagged error to the agent. Your agent's retry/fallback policy decides whether to abort the swap or proceed with degraded confidence. We recommend abort.

What gets checked

The trust gate is composed of four independent signals, each evaluated by the paid endpoint:

`source` value	Upstream	What it catches
`ofac`	U.S. Treasury SDN XML feed (cryptocurrency-tagged via Feature 345 / Detail 1432)	Sanctioned-address swaps. Refreshed every 24h from Treasury. Match here forces `recommendation: block` regardless of other signals.
`goplus`	GoPlus trust-list + token-security API	Known-malicious contract patterns: honeypots, blacklist functions, owner-can-mint, transfer-pause, hidden-fee, etc.
`etherscan_source`	Etherscan `getSourceCode`	Unverified-source contracts (a weak but consistent signal — most legitimate tokens are source-verified within days of deploy).
`anomaly`	On-call analysis	Contract age windows (under 1h / 24h / 7d), address-kind classification (contract vs. EOA via `eth_getCode`), and no-outbound transaction history.

This isn't a magic detector for every kind of malicious token. It's a layered gate where each layer catches a different class of failure, and the layered response gives the agent (or operator) enough information to decide the right action.

The signals you won't find here, by design:

LP-lock status (we don't yet trust the data sources well enough to weight this)
Deployer rug history (graph-link analysis is computationally expensive and signal-noisy)
Pump-dump / wash-trade signals (these are post-hoc — useful for forensics, not pre-trade verification)

This is one of the honest tradeoffs of a pre-trade gate: you can only block what you can detect quickly. Signals that require deep analysis fit a different layer of the stack.

Switching to paid mode

Once you've confirmed the integration works in preview, switch to paid mode for production. The flag is one line:

const agentkit = await AgentKit.from({
  walletProvider,
  actionProviders: [
    paladinTrustActionProvider({ mode: "paid" }),
  ],
});

Paid mode costs $0.001 USDC per call on Base, settled via x402 EIP-3009. The agent's wallet (any EvmWalletProvider — Viem, CDP, Privy, etc.) signs an EIP-3009 authorization; the x402 facilitator submits the on-chain transfer and pays gas. No ETH is required from the agent's wallet — x402 EIP-3009 settlement is gasless from the signer's perspective. Fund the agent's wallet with ~$0.10 USDC (100 calls at the declared price) plus a small buffer for the per-call $0.01 client-side cap (see the pre-sign safety section below).

Paid responses return live evaluations with real: true on successful factors. The schema is otherwise identical to preview — same fields, same recommendation semantics, same fail-closed contract.

The pre-sign safety guarantee

This is the differentiator that justifies the package existing alongside AgentKit's generic x402ActionProvider.

A generic x402 transport lets your agent make HTTP-via-x402 calls to any URL the LLM produces. The wallet signs whatever authorization the server's 402 challenge demands. If the server is compromised, or the URL is rewritten by a prompt-injection attack, the signed authorization could go to an unintended recipient.

@paladinfi/agentkit-actions validates the server's 402 challenge against hard-coded constants before viem signs:

USDC contract: 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913 (Base USDC, hard-coded)
Treasury (recipient): 0xeA8C33d018760D034384e92D1B2a7cf0338834b4 (PaladinFi treasury, hard-coded)
Max amount: $0.01 per call (a defense-in-depth ceiling above the declared $0.001 price; a compromised server could overcharge up to 10× before the client aborts, but not arbitrary amounts)
Settlement: EIP-3009 only (no Permit2, no infinite-approval patterns)
Validity window: ≤10 minutes from challenge (MAX_VALIDITY_SECONDS = 600 in src/x402/constants.ts)

If any field deviates, the call aborts client-side before viem signs, with an error prefixed paladin-trust BLOCKED pre-sign: so operators can grep and alert. Within this action's scope, a compromised PaladinFi server cannot redirect a signed authorization to a different recipient, asset, or chain — payment-path safety holds even if the server is fully adversarial.

A separate question worth answering directly: can a compromised PaladinFi server drain my agent's wallet via the trust verdict itself? The failure mode if our server is compromised is false-allow (returning recommendation: "allow" on a malicious token), not direct fund-loss. The trust-verdict logic is the part where you have to trust our evaluation — which is why the source is public, the response shape is versioned (trust.version), and the architectural mitigation is layered defense: this gate handles "is this destination known-bad"; your agent should also enforce its own spending caps, allowlists for high-value transfers, and slippage tolerances. Defense-in-depth is the rule; this is one layer of it.

The hard-coding is the architectural commitment. If we ever need to change the treasury or the price, it's a package version bump (visible in npm diff and npm-pinging), not a server-side config flip. Operators downstream see the change and accept it explicitly.

You can verify the constants yourself in src/x402/validate.ts — the file is short, well-commented, and the entire pre-sign hook fits on a single screen.

Limitations to know

The honest disclosures matter more than the marketing claims. A few things this package doesn't do, that you should know before relying on it:

Coverage is not exhaustive. GoPlus signals are a leading indicator; recently-deployed contracts may not yet be classified. A scam token that launches and rugs within 6 hours can clear our checks before GoPlus has indexed it. The anomaly-heuristics layer is partial mitigation (contract-age windows fire on recently-deployed contracts from a different angle), but a 6-hour-old contract with a verified source can clear our checks and still be malicious.
Base only. supportsNetwork rejects all other networks. Multichain expansion is on the roadmap; the rate-limit signals on Ethereum L1 vs. Base differ, so we'd rather ship per-chain coverage with chain-aware heuristics than a multichain mode that re-uses Base assumptions.
AgentKit alpha drift. Tested against @coinbase/agentkit@0.10.4. AgentKit's API is still evolving; minor releases may break the integration. The peer-dep pin is exact for this reason. We bump the pin as new AgentKit minors land and the integration is re-verified.
Latency. A small client-side measurement from us-east-2 against swap.paladinfi.com/v1/trust-check/preview shows TTFB consistently around 90-110ms. Measure your own RTT before treating any number as authoritative — your agent's geographic position matters more than ours. The endpoint is not on a CDN today, so RTT is roughly the speed-of-light bound from your caller to us-east-2.
Rate limits. Public rate-limit thresholds are not yet published. Today's deployed server applies per-IP rate-limiting on the preview endpoint and per-wallet rate-limiting on the paid endpoint; the specific thresholds are intentionally not surfaced while we calibrate against real traffic patterns. If you hit a 429, the response carries a structured retry_after field. Long-term we'll publish floors in a public rate-limit reference doc.
The trust gate is not the only layer. A production agent should also enforce: spending caps, allowlists for high-value transfers, slippage tolerances, time-of-day controls, and human-confirmation for transactions above a threshold. The trust gate handles "is this destination known-bad"; the other layers handle "is this transaction within scope for this agent." Both matter.

Where next

The sister package @paladinfi/eliza-plugin-trust ships the same trust-check semantic for ElizaOS agents. Both share the same security architecture (hard-coded constants + pre-sign hook); a CI drift check enforces byte-for-byte parity on the security-critical files.

Source, demo walkthrough, and the full security model are public:

Source: github.com/paladinfi/agentkit-actions
DEMO.md: github.com/paladinfi/agentkit-actions/blob/main/DEMO.md — includes the on-chain settlement tx hash (Basescan link), full request/response capture, and the EIP-3009 authorization payload that was signed
Vulnerability disclosure: SECURITY.md — email dev@paladinfi.com; please do not open public issues for security findings

For agents already running in production: start in preview mode, observe the recommendations for a week against your normal traffic, then flip to paid mode once you trust the layer's behavior. The cost at scale is bounded — $0.001 × your weekly swap volume — and the failure mode is warn-fail-closed, so the system degrades gracefully rather than going dark or silent-allowing.

The architectural pattern matters more than this specific package. Whatever you pick — this one, a competing tool, or your own internal check — pre-trade trust verification is increasingly load-bearing for agents handling user funds, and the architectural pattern (independent verification destination, hard-coded constants, structured recommendation) is the part that generalizes beyond any specific implementation.

Operated by Malcontent Games LLC (Michigan, BF1971980-1), doing business as PaladinFi. Public API at swap.paladinfi.com; operational status at swap.paladinfi.com/health; terms at paladinfi.com/terms. MIT-licensed source.