How to Add a Pre-Trade Trust Gate to Your Coinbase AgentKit Agent
When an LLM-driven agent makes an on-chain swap on behalf of a user, the agent fills in the destination, the amount, the slippage tolerance — and then signs. There's no human in the loop reading the transaction back before broadcast. Whatever the agent decided to do, the wallet does.
That's the safety surface. The agent's decision-making is only as good as the inputs it was given. If a malicious instruction enters the prompt context, or a token address comes from a poisoned data source, the agent will faithfully execute against the wrong destination.
The defense is straightforward in shape and surprisingly underused in
practice: run a verification check between "agent has formed a
plan" and "wallet signs the transaction." This post walks through
wiring that check into a Coinbase AgentKit agent on Base using a single
ActionProvider, with no changes to the agent loop itself.
The package is @paladinfi/agentkit-actions
(npm,
source),
MIT-licensed, currently at v0.1.1. It exposes one tool to your agent:
paladin_trust_check. The LLM decides when to call it —
typically before a swap or transfer.
The verification pattern
The pattern itself isn't novel. Any production trading system runs sanity checks before signing — price oracles, slippage bounds, balance precondition checks. The shape AI agents need is a generalization:
- Agent forms a plan that includes signing a transaction (swap, transfer, contract call).
- Before signing, ask: do we have any reason to believe the destination or the asset is compromised?
- The check runs against state the agent did not control — independent on-chain data, third-party threat intel, regulatory lists.
- Result drives one of three branches:
allow(proceed),warn(surface the concern to the user; agent should not auto-execute),block(abstain entirely).
The critical word is independent. If the check is performed by
the same source that produced the suspicious instruction, it's not a check
— it's part of the attack surface. The destination of the verification
call should be a service the agent could not have been talked into routing
the verification to by a malicious prompt. In practice that means:
the trust gate's endpoint is hard-coded in the
ActionProvider, not parameterized from the agent's input.
@paladinfi/agentkit-actions enforces this by construction —
the endpoint, the USDC contract for x402 payment, the treasury address,
the payment amount cap, and the EIP-712 domain are all hard-coded into the
package's source. A compromised PaladinFi server cannot redirect
this action's signed x402 authorization to a different recipient,
asset, or chain — the constants are validated client-side before viem
signs. (This guarantee scopes to the trust-check action only; supply-chain
integrity of the npm package itself is a separate concern — verify the
published constants in
src/x402/validate.ts
and pin the version.)
Installation
npm install @paladinfi/agentkit-actions
# or pnpm add / bun add
Peer dependency: @coinbase/agentkit@0.10.4 (pinned exactly —
AgentKit's API is still stabilizing).
Operated by Malcontent Games LLC (Michigan);
dev@paladinfi.com
for security disclosure. Source MIT-licensed at
github.com/paladinfi/agentkit-actions.
If you're migrating from @paladinfi/agentkit-actions@0.0.x,
drop the walletClientAccount: ... argument — paid mode now
wires through the AgentKit wallet provider's toSigner()
automatically. The legacy argument throws with a migration message at
boot.
Wiring into AgentKit (preview mode)
Preview mode is free, returns sample-fixture data, and is the right starting point for development. It lets you confirm the tool registers, the LLM uses it appropriately, and the response shape integrates with your agent's reasoning before you wire up payment.
import { AgentKit } from "@coinbase/agentkit";
import { paladinTrustActionProvider } from "@paladinfi/agentkit-actions";
import { getLangChainTools } from "@coinbase/agentkit-langchain";
// or: import { getVercelAITools } from "@coinbase/agentkit-vercel-ai-sdk";
const agentkit = await AgentKit.from({
walletProvider, // your existing EvmWalletProvider on Base
actionProviders: [
paladinTrustActionProvider(), // mode: "preview" by default
// ...your other providers
],
});
const tools = await getLangChainTools(agentkit);
That's the full integration. The agent now has a
paladin_trust_check tool. When the LLM decides to call it
(e.g., before a swap), it'll hit
POST /v1/trust-check/preview against
swap.paladinfi.com, no auth, no payment.
Preview responses are explicitly marked as samples: every factor has
real: false, and the recommendation is sample-
prefixed (sample-allow / sample-warn /
sample-block). This prevents a preview-mode screenshot from
being cropped into a misleading "real" assessment downstream. You can
develop and demo against preview indefinitely; only flip to paid when
production calls need to drive real decisions.
What the agent sees
When the LLM calls paladin_trust_check with a token address,
it gets a structured response. Here's the actual preview-mode shape (live
response from
swap.paladinfi.com/v1/trust-check/preview against Base USDC):
{
"address": "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913",
"chainId": 8453,
"taker": null,
"request_id": "b7795c5e-ae26-4e44-b78b-750ba8767435",
"trust": {
"risk_score": null,
"risk_score_scale": "0-100 (lower = safer); null on preview because no live evaluation runs",
"recommendation": "sample-allow",
"recommendation_enum": ["allow", "warn", "block"],
"factors": [
{ "source": "ofac", "signal": "not_listed", "details": "Live on paid endpoint; not evaluated on preview", "real": false },
{ "source": "etherscan_source", "signal": "verified", "details": "SAMPLE — illustrative only", "real": false },
{ "source": "goplus", "signal": "ok", "details": "SAMPLE — illustrative only", "real": false },
{ "source": "anomaly", "signal": "ok", "details": "SAMPLE — illustrative only", "real": false }
],
"version": "1.1",
"_preview": true
}
}
In paid mode (the production response from
/v1/trust-check), the shape is identical, with three
differences: recommendation is one of allow /
warn / block (no sample- prefix),
each factor's real field is true for
successfully evaluated sources, and risk_score is a number on
the 0-100 scale.
recommendation is the actionable summary. The intended
pattern: agent proceeds on allow, surfaces a warning to the
user on warn, abstains entirely on block. The
factors array lets the agent (or you, in a system prompt)
reason about why — useful when you want to log decisions, route
warns through human-confirmation, or weight specific factors differently
for your use case.
A few specific behaviors worth knowing:
- Fail-closed contract. If all underlying sources are temporarily unreachable on a paid call, the response returns
recommendation: "warn"(never silent-allow). Individual unreachable sources show up as factors withreal: falseandsignal: "unreachable". Your agent should treatwarnas "do not auto-execute" by default. - OFAC override is absolute (when reachable). If the OFAC SDN check matches and the source is reachable, recommendation is
blockregardless of other factors. There is no weighting around this. (When the OFAC source itself is unreachable, the fail-closed contract above returnswarn, not silent-allow — see the previous bullet.) - Recommendation is the source of truth. Don't try to recompute it from per-factor signals — the server applies overrides (OFAC, fail-closed) and a contract version field (
version: "1.1") that signals when the semantics tighten across releases. Readingrecommendationdirectly is the stable integration point. - Preview is explicitly marked.
recommendationis prefixedsample-, every factor hasreal: false, and_preview: trueis set on the trust block. This is intentional — a preview-mode screenshot cannot be cropped into looking like a real assessment downstream. - If
swap.paladinfi.comis fully unreachable (DNS, TLS, or 5xx), the ActionProvider surfaces a tagged error to the agent. Your agent's retry/fallback policy decides whether to abort the swap or proceed with degraded confidence. We recommend abort.
What gets checked
The trust gate is composed of four independent signals, each evaluated by the paid endpoint:
source value |
Upstream | What it catches |
|---|---|---|
ofac |
U.S. Treasury SDN XML feed (cryptocurrency-tagged via Feature 345 / Detail 1432) | Sanctioned-address swaps. Refreshed every 24h from Treasury. Match here forces recommendation: block regardless of other signals. |
goplus |
GoPlus trust-list + token-security API | Known-malicious contract patterns: honeypots, blacklist functions, owner-can-mint, transfer-pause, hidden-fee, etc. |
etherscan_source |
Etherscan getSourceCode |
Unverified-source contracts (a weak but consistent signal — most legitimate tokens are source-verified within days of deploy). |
anomaly |
On-call analysis | Contract age windows (under 1h / 24h / 7d), address-kind classification (contract vs. EOA via eth_getCode), and no-outbound transaction history. |
This isn't a magic detector for every kind of malicious token. It's a layered gate where each layer catches a different class of failure, and the layered response gives the agent (or operator) enough information to decide the right action.
The signals you won't find here, by design:
- LP-lock status (we don't yet trust the data sources well enough to weight this)
- Deployer rug history (graph-link analysis is computationally expensive and signal-noisy)
- Pump-dump / wash-trade signals (these are post-hoc — useful for forensics, not pre-trade verification)
This is one of the honest tradeoffs of a pre-trade gate: you can only block what you can detect quickly. Signals that require deep analysis fit a different layer of the stack.
Switching to paid mode
Once you've confirmed the integration works in preview, switch to paid mode for production. The flag is one line:
const agentkit = await AgentKit.from({
walletProvider,
actionProviders: [
paladinTrustActionProvider({ mode: "paid" }),
],
});
Paid mode costs $0.001 USDC per call on Base, settled via
x402 EIP-3009. The agent's wallet (any EvmWalletProvider —
Viem, CDP, Privy, etc.) signs an EIP-3009 authorization; the x402
facilitator submits the on-chain transfer and pays gas. No ETH is
required from the agent's wallet — x402 EIP-3009 settlement is
gasless from the signer's perspective. Fund the agent's wallet with
~$0.10 USDC (100 calls at the declared price) plus a small buffer for the
per-call $0.01 client-side cap (see the pre-sign safety section below).
Paid responses return live evaluations with real: true on
successful factors. The schema is otherwise identical to preview — same
fields, same recommendation semantics, same fail-closed
contract.
The pre-sign safety guarantee
This is the differentiator that justifies the package existing alongside
AgentKit's generic x402ActionProvider.
A generic x402 transport lets your agent make HTTP-via-x402 calls to any URL the LLM produces. The wallet signs whatever authorization the server's 402 challenge demands. If the server is compromised, or the URL is rewritten by a prompt-injection attack, the signed authorization could go to an unintended recipient.
@paladinfi/agentkit-actions validates the server's 402
challenge against hard-coded constants before viem signs:
- USDC contract:
0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913(Base USDC, hard-coded) - Treasury (recipient):
0xeA8C33d018760D034384e92D1B2a7cf0338834b4(PaladinFi treasury, hard-coded) - Max amount:
$0.01per call (a defense-in-depth ceiling above the declared $0.001 price; a compromised server could overcharge up to 10× before the client aborts, but not arbitrary amounts) - Settlement: EIP-3009 only (no Permit2, no infinite-approval patterns)
- Validity window: ≤10 minutes from challenge (
MAX_VALIDITY_SECONDS = 600insrc/x402/constants.ts)
If any field deviates, the call aborts client-side before viem
signs, with an error prefixed
paladin-trust BLOCKED pre-sign: so operators can grep and
alert. Within this action's scope, a compromised PaladinFi server cannot
redirect a signed authorization to a different recipient, asset, or chain
— payment-path safety holds even if the server is fully adversarial.
recommendation: "allow" on a malicious token), not
direct fund-loss. The trust-verdict logic is the part where you have to
trust our evaluation — which is why the source is public, the response
shape is versioned (trust.version), and the architectural
mitigation is layered defense: this gate handles "is this destination
known-bad"; your agent should also enforce its own spending caps,
allowlists for high-value transfers, and slippage tolerances.
Defense-in-depth is the rule; this is one layer of it.
The hard-coding is the architectural commitment. If we ever need to change
the treasury or the price, it's a package version bump (visible in
npm diff and npm-pinging), not a server-side
config flip. Operators downstream see the change and accept it explicitly.
You can verify the constants yourself in
src/x402/validate.ts
— the file is short, well-commented, and the entire pre-sign hook fits on
a single screen.
Limitations to know
The honest disclosures matter more than the marketing claims. A few things this package doesn't do, that you should know before relying on it:
- Coverage is not exhaustive. GoPlus signals are a leading indicator; recently-deployed contracts may not yet be classified. A scam token that launches and rugs within 6 hours can clear our checks before GoPlus has indexed it. The anomaly-heuristics layer is partial mitigation (contract-age windows fire on recently-deployed contracts from a different angle), but a 6-hour-old contract with a verified source can clear our checks and still be malicious.
- Base only.
supportsNetworkrejects all other networks. Multichain expansion is on the roadmap; the rate-limit signals on Ethereum L1 vs. Base differ, so we'd rather ship per-chain coverage with chain-aware heuristics than a multichain mode that re-uses Base assumptions. - AgentKit alpha drift. Tested against
@coinbase/agentkit@0.10.4. AgentKit's API is still evolving; minor releases may break the integration. The peer-dep pin is exact for this reason. We bump the pin as new AgentKit minors land and the integration is re-verified. - Latency. A small client-side measurement from us-east-2 against
swap.paladinfi.com/v1/trust-check/previewshows TTFB consistently around 90-110ms. Measure your own RTT before treating any number as authoritative — your agent's geographic position matters more than ours. The endpoint is not on a CDN today, so RTT is roughly the speed-of-light bound from your caller to us-east-2. - Rate limits. Public rate-limit thresholds are not yet published. Today's deployed server applies per-IP rate-limiting on the preview endpoint and per-wallet rate-limiting on the paid endpoint; the specific thresholds are intentionally not surfaced while we calibrate against real traffic patterns. If you hit a 429, the response carries a structured
retry_afterfield. Long-term we'll publish floors in a public rate-limit reference doc. - The trust gate is not the only layer. A production agent should also enforce: spending caps, allowlists for high-value transfers, slippage tolerances, time-of-day controls, and human-confirmation for transactions above a threshold. The trust gate handles "is this destination known-bad"; the other layers handle "is this transaction within scope for this agent." Both matter.
Where next
The sister package
@paladinfi/eliza-plugin-trust
ships the same trust-check semantic for ElizaOS agents. Both share the
same security architecture (hard-coded constants + pre-sign hook); a CI
drift check enforces byte-for-byte parity on the security-critical files.
Source, demo walkthrough, and the full security model are public:
- Source: github.com/paladinfi/agentkit-actions
- DEMO.md: github.com/paladinfi/agentkit-actions/blob/main/DEMO.md — includes the on-chain settlement tx hash (Basescan link), full request/response capture, and the EIP-3009 authorization payload that was signed
- Vulnerability disclosure: SECURITY.md — email
dev@paladinfi.com; please do not open public issues for security findings
For agents already running in production: start in preview mode, observe
the recommendations for a week against your normal traffic, then flip to
paid mode once you trust the layer's behavior. The cost at scale is
bounded — $0.001 × your weekly swap volume — and the failure mode is
warn-fail-closed, so the system degrades gracefully rather
than going dark or silent-allowing.
The architectural pattern matters more than this specific package. Whatever you pick — this one, a competing tool, or your own internal check — pre-trade trust verification is increasingly load-bearing for agents handling user funds, and the architectural pattern (independent verification destination, hard-coded constants, structured recommendation) is the part that generalizes beyond any specific implementation.
Operated by Malcontent Games LLC (Michigan, BF1971980-1), doing business as PaladinFi. Public API at swap.paladinfi.com; operational status at swap.paladinfi.com/health; terms at paladinfi.com/terms. MIT-licensed source.