Methodology
A complete, transparent breakdown of how the experiment works — what data the models receive, how they process it, and how trades are scored.
Experiment Design
Two frontier AI models — Claude Opus 4.6 (Anthropic) and GPT-5.4 (OpenAI) — compete in a structured 6-week live trading experiment using demo accounts hosted by Pepperstone Markets. Both models receive identical trading conditions, identical data, and identical analysis infrastructure through SkyAnalyst AI. Every trade, every decision, every “no-trade” call is logged and published.
The experiment is divided into three phases across six weeks. Phase 1 covers US indexes (US30, NAS100, SPX500). Phase 2 covers forex and gold (EUR/USD, GBP/USD, XAUUSD). Phase 3 combines all instruments simultaneously. Each model manages a $50,000 demo account with strict 1% risk per trade.
Live Trading on Pepperstone Markets
This is not a backtest. Both models trade in real-time on live market conditions using two separate $50,000 demo accounts hosted by Pepperstone Markets, a globally regulated broker operating under ASIC, FCA, and CySEC oversight. Demo accounts mirror live market conditions — real spreads, real price feeds, real execution timing.
Broker Details
- Broker
- Pepperstone Markets
- Account Type
- Demo — Standard Spread, No Commission
- Platform
- cTrader / MT5
- Regulation
- ASIC, FCA, CySEC, DFSA, SCB
Account Configuration
- Starting Balance
- $50,000 per model
- Risk Per Trade
- 1% of account equity
- Execution
- SkyAnalyst Proprietary Trading Bridge → cTrader / MT5
- Verification
- All fills independently auditable via broker statements
Third-Party Verification — Myfxbook
Both accounts are publicly tracked on Myfxbook for additional transparency. Equity curves, trade history, drawdown, and profitability metrics are independently verified and available in real-time under the following profiles:
Trades are executed automatically through the SkyAnalyst Proprietary Trading Bridge, which connects the AI analysis layer directly to cTrader and MT5. When a model issues an entry signal, the bridge translates it into a real order with proper lot sizing (calculated from the 1% risk rule and the structural stop distance), stop loss, and three take-profit levels — executed in milliseconds with no human in the loop.
Why demo accounts instead of live capital? Transparency and reproducibility. Demo accounts on Pepperstone mirror live market conditions while eliminating the variable of capital risk, which would introduce regulatory and ethical complications for a public experiment. The results are verifiable, the conditions are real, and the execution is identical to what a live account would produce.
What the Models Receive
Before each trading session, SkyAnalyst AI assembles a structured data packet of approximately ~100,000 tokens per instrument. This is not a simple price feed — it is a professional-grade analysis environment equivalent to what a Chartered Market Technician would review before making a trading decision. The data packet contains four layers:
Layer 1 — Multi-Timeframe Candle Data
5 hours of price action across three timeframes — 60-minute, 15-minute, and 5-minute candles. Each candle includes open, high, low, close, and volume, plus a full indicator overlay:
Layer 2 — Session Structure & Fibonacci
Key reference levels from each trading session — Tokyo, London, and New York highs and lows — plus Fibonacci retracement and extension levels computed from the dominant swing. These provide the structural framework the models use to identify entry zones, stop loss placement, and profit targets.
Layer 3 — Macro Context Window (5-Day)
A rolling 5-day snapshot of cross-asset market conditions, delivered as structured JSON with daily values, EMAs, and range positions:
Layer 4 — AI Agent Pre-Analysis
Before the models even begin their analysis, two specialized AI agents within SkyAnalyst AI have already processed the data:
Macro Analysis Agent
Synthesizes the macro environment, economic calendar releases, and intermarket correlations into a directional bias with a confidence score and tradeability rating. Outputs both an intraday and multi-day horizon assessment.
Trend Authority Agent
Evaluates the technical structure — EMA alignment, momentum, regime classification (trending, ranging, volatile) — and provides direction, confidence, key support/resistance levels, and an invalidation price. Also recommends position sizing adjustments based on volatility.
The complete data packet also includes the day's economic calendar with impact ratings, any pre-market news summaries, and the previous session's analysis for continuity. Prompts may vary per instrument to account for asset-specific dynamics (equity index vs. forex vs. commodities).
Classical CMT Methodology
The analysis framework is rooted in classical Chartered Market Technician (CMT) methodology. We deliberately do not use alternative technical frameworks such as ICT concepts, Fair Value Gaps (FVG), order blocks, or other non-classical approaches. The indicator suite is a curated subset of proven CMT tools — EMA trend structure, RSI momentum, MACD confirmation, ATR-based volatility scaling, VWAP anchoring, and Fibonacci levels.
Each model receives a persistent system prompt that defines its reasoning framework — the “playbook” that stays constant across every session. This is the actual instruction set:
System Prompt — Decision Framework
1. Risk regime:Read the Macro Agent's bias, confidence, and tradeability. Check the cross-asset environment (DXY, 10Y, VIX, NYAD). If tradeability is low, raise the bar or stand aside. Classify the environment.
2. Agent synthesis:Read the Trend Agent's direction, confidence, regime, key levels, and invalidation. When both agents agree with solid confidence, strongest foundation. When they conflict, note why and reduce conviction. Trending regime favors continuation; ranging favors mean-reversion toward VWAP.
3. Session context: Assess gap vs prior close relative to ATR. Read the session handoff from 60min candles — where is price relative to session high/low and VWAP? Identify the 1–2 dominant drivers today.
4. Multi-timeframe read: 60min for bias (EMA 9/21/50, RSI, MACD). 15min for structure. 5min for entry precision — VWAP tests, EMA pullbacks, session levels, Fibonacci zones. Entry zones must be at 5m/15m structural levels.
5. Calendar gate: No entries within 15 minutes of high-impact events. If data already released, assess the reaction and whether it has settled.
6. Build or pass: Only propose setups where macro environment, agent signals, and technical structure all support the direction. If any domain actively contradicts, state the conflict and reduce confidence or pass. Stop placement is structural, scaled to current volatility: on compressed days (VIX declining, narrow ranges), stops tighter near structure; on expanding days (VIX rising, wide ranges), stops wider — but the setup must still meet minimum 1.5:1 R:R after the wider stop. If volatility makes R:R unworkable at structural stop levels, No Trade. If the structural stop exceeds the Trend Agent invalidation level, skip the setup. Add a small buffer beyond the stop for execution slippage — setups are forwarded directly to an automated trading system. TP1 should target 1R–1.25R at a structural level. If no structure exists in that zone, evaluate the full target profile — a close TP1 with a strong TP2 at 2R+ is a valid trade. Reject only when the trade is structurally inverted: the highest-probability exit delivers less than 1R and reaching further targets requires breaking through major levels.
If conviction is low, No Trade is the correct output.
This framework is identical for both models. It defines how they reason — the session data (next section) defines what they reason about.
What Changes Every Day
While the system prompt (above) stays constant, the session data changes with every trading day. Below is the structure of the data packet assembled by SkyAnalyst AI and injected alongside the system prompt. Prompts may vary per instrument. The actual data values, candle arrays, and agent outputs are omitted — in production, this packet is approximately ~100,000 tokens.
// System: You are an expert CMT trading analyst...
=== MACRO ANALYSIS AGENT (EURUSD / forex) ===
Group Bias: bull (confidence: 28%) | Data age: 12min
EURUSD Bias: strong_bear (score: -75) | Confidence: 22%
Horizon: intraday=bear, short-term=strong_bear
Tradeability: high (72/100)
[BEARISH] Fed-ECB rate divergence: 160+ bps...
[BEARISH] Eurozone energy shock: 3-4x baseline...
=== END MACRO ANALYSIS AGENT ===
=== TREND AUTHORITY AGENT ===
Direction: BULLISH | Confidence: 64% (Moderate)
Regime: Trending | Strength: Moderate
Key Resistance: 4593.3 | Key Support: 4553.28
VWAP: 4554.9 | Invalidation: 4546.92
Recommendation: Reduce size (VIX elevated)
=== END TREND AUTHORITY AGENT ===
### Economic Calendar
10:00am USD - JOLTS Job Openings [HIGH IMPACT]
(Forecast: 6.89M, Previous: 7.24M)
### Market Indicators (5-Day JSON)
{ vix, dxy, oil, gold, 10y_yield, nyad... }
### 60min Candles (5h) + EMA, ATR, MACD, RSI, Vol, VWAP
### 15min Candles (5h) + full indicator suite
### 5min Candles (5h) + full indicator suite
### Session Levels
Tokyo High/Low, London High/Low, NY High/Low
Fibonacci retracement/extension levels
// Instruction: Follow the 6-step framework.
// Produce 0-2 setups with SL and 3 TP levels.
// If conviction is low, No Trade is correct.
Note: This is a simplified representation. The full prompt includes complete candle arrays with all indicator values, detailed agent reasoning, and instrument-specific context.
API Settings & Fairness Controls
Both models are called via their respective official APIs with settings designed for maximum reasoning depth. Neither model has a temperature advantage — GPT 5.x does not support temperature settings, and Claude Opus 4.6 uses its default. Both receive identical input data and identical system prompts.
| Setting | GPT-5.4 | Claude Opus 4.6 |
|---|---|---|
| Model ID | gpt-5.4-2026-03-05 | claude-opus-4-6 |
| Max Tokens | 25,000 tokens | 25,000 tokens |
| Temperature | Not supported (GPT 5.x) | Not explicitly set |
| Reasoning Effort | High | Default (High) |
| Timeout | 120s (2 min) | 120s (2 min) |
| API Endpoint | /v1/chat/completions | /v1/messages |
Both models receive streaming responses. The difference in max token limits reflects each provider's API defaults — in practice, session analyses typically use 4,000–8,000 completion tokens.
What the Models Produce
After processing the ~100K token data packet, each model produces a comprehensive session analysis that typically identifies 1–2 trade setups. Each setup includes:
Direction & Thesis
Long or short, with a written rationale explaining the confluence of signals
Entry Zone
A price range at a structural level, not a single price point
Stop Loss
Structural stop scaled to current volatility via ATR, with slippage buffer
3 Take-Profit Levels
TP1 at ~1R (structural), TP2 at ~2R, TP3 at extended target. Minimum 1.5:1 R:R required
Confidence Gate
6-factor confluence scoring: macro alignment, yield direction, DXY, trend agent, key level, EMA stack
Risk Assessment
Specific risks to the trade with mitigation strategies (news events, resistance clusters, VIX-driven sizing)
Once a setup triggers, the model monitors price in real-time and issues an entry signal with a confidence percentage. The entry is then executed automatically via the SkyAnalyst Proprietary Trading Bridge to cTrader / MT5 on Pepperstone Markets — no human touches the trade at any point.
Why This Cannot Be Replicated in ChatGPT or Claude Alone
You cannot reproduce this experiment by pasting a prompt into ChatGPT or Claude. Here's why:
No access to real-time market data
ChatGPT and Claude do not have access to live price feeds, broker candle data, or real-time economic calendar releases. The ~100K token data packet is assembled by SkyAnalyst AI from live broker APIs, structured and formatted specifically for LLM consumption.
No AI agent pre-processing layer
The Macro Analysis Agent and Trend Authority Agent are proprietary SkyAnalyst AI systems that run independently before the trading model sees the data. These agents provide the bias, confidence, regime classification, and tradeability scores that form the foundation of every analysis. Without them, the model is working blind.
No broker bridge for live execution
Analysis without execution is academic. The SkyAnalyst Proprietary Trading Bridge connects directly to cTrader and MT5 on Pepperstone Markets, translating the model's trade signals into real orders with proper lot sizing, stop loss, and take profit levels — executed in milliseconds. This is the difference between a research paper and a live trading system.
No real-time monitoring and entry timing
After the model identifies a setup, SkyAnalyst AI continuously monitors price action in real-time, waiting for the exact entry trigger conditions to be met. The model evaluates each new candle against its setup criteria and issues an entry signal only when conditions align. This monitoring loop cannot happen in a chat interface.
The importance of executing in a real broker environment — even on demo accounts — cannot be overstated. It forces the system to deal with real spreads, real slippage, real market gaps, and real execution timing. A backtest can always be curve-fit. A live demo account operating under real market conditions cannot.
Constraints & Risk Management
Trading Window
8:00–11:00 AM EST daily
Starting Balance
$50,000 per model
Risk Per Trade
1% of account equity
Minimum R:R
1.5:1 or no trade
News Exclusion
No entries within 15 min of high-impact events
Execution
SkyAnalyst Proprietary Trading Bridge → cTrader / MT5 on Pepperstone
Additional risk controls:
- Stop placement is structural, scaled to current volatility via ATR
- If VIX is elevated, position size is reduced per Trend Agent recommendation
- If structural stop exceeds Trend Agent invalidation level, setup is skipped
- If volatility makes R:R unworkable at structural stop levels, No Trade
- A “no-trade” decision is a valid, scored action
How Trades Are Scored
Performance is evaluated across five dimensions:
Total P&L
Absolute dollar and percentage return on the $50K account
Win Rate
Percentage of trades closed in profit
Max Drawdown
Largest peak-to-trough decline during the competition
Risk-Adjusted Return
Sharpe-like ratio measuring return per unit of risk taken
Consistency
Performance stability across trading days — penalizes boom/bust patterns
Transparency & Disclosures
This experiment is conducted by The AI Trading Benchmark, with trading infrastructure provided by SkyAnalyst AI (a product of SkyWeaver Trading LLC) and trade execution hosted by Pepperstone Markets. This is an independent research initiative. Results are published transparently regardless of outcome.
Claude is a trademark of Anthropic. GPT is a trademark of OpenAI. This experiment is independent and not endorsed by either company.
Trading involves risk. This experiment uses demo accounts and is conducted for educational and research purposes. Past performance — including results from this experiment — does not guarantee future results. Nothing published on this site constitutes financial advice.