Who won Week 2 of the AI Trading Benchmark?

Neither. Both models closed Week 2 in the red. Claude finished at $50,610.09 (-$348.42 for the week), GPT at $49,257.72 (-$742.28). Claude had the smaller loss, but the headline of Week 2 is that both AIs survived the Trump-Iran Friday whipsaw without a blow-up.

What was the Trade of the Week?

Claude's EURUSD short on Wednesday April 23. Entry 1.17105, exit 1.1679, +1.67R analysis ride for +$1,606.38. The setup played a textbook dollar-firming thesis and the trade scaled all three take-profit targets inside a single six-hour European session. It was the only +1.5R+ print across both models all week.

How did the AI models handle the Trump-Iran Friday volatility?

Two trades total across both models on Friday April 24. Claude lost its US500 short to a stop (-$940). GPT took one trade — NAS100 long, scaled at TP1 for +$1,149.12, the only green print on the day. Combined Friday damage: +$209.12 — a small net positive for the benchmark on a session that chopped retail.

What is the AI Trading Benchmark season score after two weeks?

Through two weeks and 13 total trades, Claude leads on every aggregate stat. Claude: 10 trades, 50.0% win rate, +0.90 net R, $50,610.09 closing balance (+1.22% season return). GPT: 3 trades, 33.3% win rate, -0.98 net R, $49,257.72 closing balance (-1.48% season return). Claude leads by $1,352.37 in P&L.

Why did GPT trade so much less than Claude this week?

GPT took 3 trades, Claude took 3 — same count, different distribution. GPT concentrated two losses in a single Wednesday session, then sat out Thursday and took one trade Friday. Claude spread three trades across Wednesday and Friday. The cadence difference reflects GPT's tendency toward selectivity in chop regimes.

How is the AI Trading Benchmark methodology different from a backtest?

Every trade in the benchmark is real broker execution on a Pepperstone demo account. The models output entry, stop, and three take-profit targets. The broker fills the orders. The ledger records actual P&L. There is no curve-fitting, no forward-testing, no idealized fills — just two AI models trading the same instruments under the same risk framework, head-to-head.

Last updated May 12, 2026

Editor's note (updated 2026-05-12): This editorial has been revised to reflect Season 1 scope refinement. XAUUSD and USDJPY were removed from the experiment universe; the numbers and references below reflect the updated 4-instrument scope (NAS100, US30, US500, EURUSD). Original publication date preserved.

EDUARDO’S WEEKLY ANALYSIS — APRIL 20 — 24, 2026

AI Trading Benchmark Week 2 Results: Claude vs GPT Survive Trump-Iran Headline Week (April 20-24, 2026)

Claude closed -$348 on 3 trades, GPT -$742 on 3. Discipline beat the chop. No blow-ups, no concentration losses.

Eduardo — Senior Research Editor

Key Findings

Both models survived the season's most volatile week with controlled losses — Claude closed at $50,610.09 (+1.22% season return), GPT at $49,257.72 (-1.48%). Neither blew up.
GPT was the only model green on Friday, taking [+$1,149.12 on NAS100](/articles/gpt-nas100-long-trade-april-24-2026) while Claude lost [-$940 on a US500 short](/articles/claude-us500-short-trade-april-24-2026) into the Trump-Iran whipsaw.
[Claude's EURUSD short on Wednesday](/articles/claude-eurusd-short-trade-april-23-2026) was the week's standout call: +1.67R and +$1,606.38 on a single setup, the only six-figure-swing trade across both models.
Together the models took 6 in-scope trades over five sessions — fewer entries than Week 1, broader spread of instruments, and zero same-direction overlap on Friday's headline-driven sessions.
Season scorecard after two weeks: Claude 10 trades / 50.0% win rate / +0.90R / +$610.09 net, GPT 3 trades / 33.3% win rate / -0.98R / -$742.28 net. Claude leads on every aggregate stat.

Season Scorecard

Claude Opus 4.6

Win Rate: 50.0%
Season R: +0.9R
Net P&L: +$610
Trades: 10

GPT-5.4

Win Rate: 33.3%
Season R: -1.0R
Net P&L: -$742
Trades: 3

The Week in Macro

Week 2 belonged to one story: Trump-Iran. The headline risk that had been simmering since the start of the season detonated on Friday April 24, and the entire tape spent the back half of the week pricing what happens when geopolitics overrides every other input. That is the regime the AI Trading Benchmark traded into.

The opening sessions were quiet by recent standards. Monday and Tuesday produced no completed trades from Claude — the model entered NAS100 long at 26,595.80 on Monday but the position remained open as the week ended, never resolving to TP or SL inside the window. GPT was quiet through Tuesday as well, then took two losses on Wednesday inside a single morning window as US30 and US500 both moved against the model.

The dollar story shifted mid-week. DXY firmed from the prior week's lows as risk-off positioning built into Friday. Yields backed up. The risk-on instruments that had carried Week 1 became unreliable — Wednesday's chop took out two GPT stops on the index side inside two hours, the kind of move that only happens when intraday correlation regimes fracture.

Then Friday. Trump-Iran tension dominated every screen. Equity indices opened soft, ripped higher mid-session on a denial headline, then sold off again into the close. Intraday whipsaws hit every major instrument. Retail traders got chopped — every social feed by Sunday was a montage of stopped-out longs and stopped-out shorts inside the same hour. The benchmark took two trades into that environment: Claude went short US500 and got stopped. GPT went long NAS100 and took the partial at TP1 for +$1,149.12 before the position rolled. Combined Friday damage across both models: +$209.12. That is a very small number — and a green one — for a day that catastrophically blew up an unknown number of retail accounts.

The macro frame for Week 2 is straightforward. Headline risk is now the dominant input. Setups that worked under the disinflation tape of Week 1 — long indices on dollar weakness — stopped working when the geopolitical bid replaced the macro bid. The models that survive this regime are the ones that take fewer entries, size them smaller, and accept that some weeks the right answer is a flat tape and a clean weekend.

About reported results. Each setup defines three take-profit targets (TP1, TP2, TP3), but the broker closes the full position at TP1 — so the realized R-multiple is always TP1's distance from entry when any TP is hit, and -1R on a stop. The dollar P&L shown in this editorial is the actual broker close at TP1 (or stop) for each trade. TP2 and TP3 are reported as informational levels: how far price ran after the broker had already exited.

Equity Curve

Apr 20 — Apr 24

Head-to-Head

Head-to-Head
Metric	Claude	GPT
Trades	3	3
Wins	1	1
Losses	2	2
Win Rate	33.3%	33.3%
Net R	-0.3R	-1.0R
Net P&L	-$348	-$742
Biggest Win	+$1,606	+$1,149
Biggest Loss	-$1,015	-$1,101
Peak Balance	$51,550	$50,000
Trough Balance	$50,610	$48,109

Claude's Week

Claude was the better analyst and the worse executor this week. The model's read on EURUSD on Wednesday — short at 1.17105, exit at 1.1679, +1.67R analysis ride for +$1,606.38 — was the cleanest pre-trade thesis any model has produced on this benchmark. The dollar firmed into the European session, Claude was already positioned, and the trade scaled all three targets without a meaningful retracement. That is what conviction-on-conviction trading looks like when the macro tape cooperates.

The other two trades did not cooperate. Claude took a NAS100 long on Wednesday at 26,957 and got stopped out at 26,917.60 inside the first hour for -$1,014.80 — the same instrument the EURUSD same-day session was a pure chop on the index side. Then Friday: US500 short at 7,125 stopped at 7,133.90 for -$940. Two losses, both inside headline-driven sessions, neither a thesis failure. Both were regime failures — entries that would have worked in a normal tape and didn't survive an abnormal one.

The character signature is consistent with what Claude showed in Week 1. The model takes more trades than GPT (3 vs 3 this week, but Claude's trade-per-day distribution was more concentrated), commits when it has a read, and accepts the variance that comes with higher frequency. The Wednesday EURUSD trade is the model in its best form. The Wednesday and Friday stops are the model in its worst form, but the worst form did not turn into a blow-up because Claude held to fixed risk per trade — every loss was within ~1R of the model's $50,000 demo capital, and the cumulative drawdown across the two losing trades was -$1,954.80, less than 4% of starting capital.

Season-end-of-Week-2 position: $50,610.09, up +1.22% from the $50,000 starting balance. After two weeks and 10 total trades, Claude's win rate is 50.0% and net R is +0.90. The Wednesday EURUSD trade kept the season number close to flat in R terms despite the loss-heavy stretch through Wednesday and Friday.

The thing to watch in Week 3 is whether Claude maintains entry frequency or pulls back. The Friday US500 short was a thesis-correct setup in the wrong regime. A model that learns the regime shifted will trade less next week. A model that doesn't will repeat the same losses. Either response is informative — the benchmark is built to expose exactly this kind of regime-adaptation question, and Week 3 is when the answer starts to surface.

GPT's Week

GPT had its first in-scope trades of the season this week. The model's Week 1 ledger was empty after the scope refinement, so Week 2 is effectively the start of the head-to-head comparison on the ledger. The cadence: zero trades Monday, zero Tuesday, two losses Wednesday, zero Thursday, one win Friday. Three entries total. Two losses, one winner, net -$742.28 for the week.

Wednesday was where the week went sideways. GPT took two trades in a single morning session and lost both. US30 long at 49,563 stopped at 49,490 for -$1,101.40. US500 long at 7,129.60 stopped at 7,121.30 for -$790. Two trades, two stops, all inside the same risk-off window. The combined Wednesday damage was -$1,891.40 — the worst single day GPT has posted on this benchmark.

But GPT did not chase. Thursday produced zero trades. Friday produced exactly one entry: NAS100 long at 27,229.20, scaled at TP1 for +$1,149.12. That trade was the only green print on the Friday tape across both models. GPT was on the right side of the only direction that paid in the headline-driven session, and it took the partial rather than holding for higher TPs that the whipsaw would have erased.

The character signature emerging from GPT's first in-scope week: trades less frequently than Claude, accepts smaller targets, and is more willing to sit out a session entirely. Three trades for the week, one winner, two losses. Win rate 33.3%. Net R -0.98. Net P&L -$742.28. The Wednesday double-loss stretch was painful, but the Friday winner covered more than half of the damage and kept the week from running into deeper drawdown territory.

Season-end-of-Week-2 position: $49,257.72, down -1.48% from the $50,000 starting balance. After two weeks and 3 total trades, GPT's win rate is 33.3% and net R is -0.98. The model is behind Claude on every aggregate stat. It is also still in the simulation with no catastrophic days.

The thing to watch in Week 3 is whether GPT's selectivity is conviction or fear. Two trades on Wednesday and one on Friday is the right cadence for a chop-and-headline regime. But selectivity that becomes paralysis is worse than overtrading — every week without a winner compounds the season-return gap. The Friday partial proves the model still has the read; the test is whether it pulls the trigger at the right moments in Week 3's heavier macro calendar.

Surviving a headline-driven week without a blow-up is the trade of the season. Both AIs did it by trading less, sizing right, and refusing to chase the move that wasn't there.

Trump-Iran Friday: Why the Whipsaw Mattered

The defining session of Week 2 was Friday April 24, and the defining input was geopolitics. By the open, Trump-Iran tension was the only headline that mattered. Equity futures gapped lower, ripped on a denial wire mid-morning, then sold the rip into a soft close. Bond yields backed up and then collapsed. Every cross-asset correlation that had held through Week 1 broke inside a single session.

Two trades hit the benchmark that day. Claude shorted US500 at 7,125 and got stopped at 7,133.90 for -$940. GPT went long NAS100 at 27,229.20 and scaled at TP1 for +$1,149.12. Combined day P&L across both models: +$209.12. For context, that is a green print on a session that, by every social-media account from Sunday, blew up an unknown number of leveraged retail traders, and the worst-case combined outcome was a small net positive.

That is the headline of Week 2. Not "Claude won" or "GPT won." Both models lost on net for the week — Claude -$348.42, GPT -$742.28. But neither model produced a single-day catastrophic loss, neither doubled down after a stop, and neither chased a reversal that would have erased a week of work in a single afternoon. Risk discipline is the season-level story.

Why Claude Outperformed on Wednesday

The single best trade of the week was Claude's EURUSD short on Wednesday April 23. Entry 1.17105, exit 1.1679, R-multiple +1.67, dollar return +$1,606.38 on the analysis ride. The dollar firmed into the European session as risk-off positioning built ahead of Thursday's calendar. Claude was short EURUSD before the move, scaled at TP1, held into TP3, and the trade closed without a meaningful retracement.

That is conviction-on-conviction trading. The thesis (dollar firming on shifting Fed expectations and risk-off flows) was correct. The execution (entry, sizing, target placement) was correct. And the timing — Wednesday, before the Friday headline shock — was correct in the sense that Claude captured the directional move before the regime fractured. After Friday's whipsaw, EURUSD reversed back through Claude's entry. A trader who held the position into the headline session would have given back most of the gain. Claude didn't, because the analysis-side targets resolved cleanly inside the window.

The same Wednesday session also produced Claude's NAS100 long at 26,957 — stopped at 26,917.60 for -$1,014.80. The model nailed the EURUSD setup and missed the index setup in the same morning. That is the variance any high-frequency analyst absorbs. The Wednesday net for Claude was +$591.58 across two trades, which is what a 50%-win-rate distribution looks like when one of the wins is a +1.67R outlier.

GPT's Disciplined Friday: The Lone Green Print

GPT's NAS100 long on Friday was the only winning trade either model took during the headline session. Entry 27,229.20, exit at TP1 27,321.90, +$1,149.12 on a +1.02R blended exit. The position scaled at the first take-profit and closed before the afternoon reversal. That is exactly the right execution for a headline-driven chop session — take the partial, lock the profit, and let the rest of the day produce zeros.

GPT did not take a second trade Friday. Did not chase the reversal, did not flip short on the rip, did not add to the winner. One entry, one partial, one closed position. Compare this to Claude's Friday — a single US500 short stopped inside the same whipsaw window — and the instrument-selection difference is the difference between green and red on the day. GPT's selectivity in chop is the model's most interesting Week-2 character trait.

The Wednesday double-loss complicates the GPT picture. Two stops in a single session, -$1,891.40 combined, is an ugly print regardless of context. But the model recovered most of it inside a single Friday partial, which suggests that the Wednesday losses were not thesis failures so much as concentrated risk in a single regime window. A model that takes two trades inside the same risk-off morning is not diversifying timing; it is concentrating it. That is a tractable mistake.

Same-Direction Overlap: Zero on Friday

Week 2 produced zero same-direction overlaps on the most important day. On Friday, Claude was short US500; GPT was long NAS100. Two different instruments, two different directions on the index side, no shared thesis.

That divergence matters because it tells you the models are reading the regime differently. In a clean directional tape, two AI models can agree on direction and split on instrument. In a chop tape, they can disagree on direction entirely — and on Friday, they did. Whether Claude's read or GPT's read was "right" is the wrong question. The tape was a coin flip. The right question is which model sized the coin flip correctly. GPT did. Claude did not.

What the Season Scorecard Says

After two weeks and 13 total trades (Claude 10, GPT 3), Claude leads on every aggregate stat. Win rate 50.0% vs 33.3%. Season net R +0.90 vs -0.98. Closing balance $50,610.09 vs $49,257.72. Net P&L gap of $1,352.37 between the two models, in Claude's favor.

But here is the season-level story that the scorecard understates: both models are still in the simulation. After two weeks of which the second was a textbook headline-driven chop, both AI models are within 2% of their starting balance. The benchmark has not yet seen a -10% week from either model, has not seen a single day worse than -4% from either model, and has not seen the kind of revenge-trading behavior that turns a -1% week into a -5% week. The season-level takeaway after Week 2 is not who's winning. It's that nobody has lost yet.

The Trade of the Week

Trade of the Week for the AI Trading Benchmark's second editorial cycle goes to Claude's EURUSD short on Wednesday April 23. The setup was textbook. The execution was clean. The dollar-return was the largest single-trade print across both models for the week (+$1,606.38), and the R-multiple (+1.67R) was the only number above 1.5R that the benchmark produced in five sessions.

The setup context: Wednesday opened with the dollar already firming. DXY had bid up off the Tuesday lows on a pre-positioned risk-off shift ahead of the Thursday and Friday calendars. EURUSD was rejecting 1.1730 from below, the prior week's range high, and the four-hour structure had rolled over. Claude entered short at 1.17105 — the precise inflection where the rejection pattern resolved into directional sell flow — and placed a stop above the swing high. Risk was sized at $499.20, exactly 1% of capital at the time of entry. Position size was 13.06 lots.

What followed was a near-perfect resolution. Price walked through TP1 inside the European session, scaled at TP2 in the New York morning, and closed the analysis-side at TP3 (1.1679) before the New York lunch chop. Total move from entry to TP3: 31.5 pips. Total time in trade: roughly six hours. No meaningful retracement against the position from entry to final exit.

The reason this is the Trade of the Week and not just a routine TP3 hit is the timing. Claude had this trade on the books before Friday's regime shift. EURUSD reversed back through 1.17105 on the Friday Trump-Iran headlines — a trader holding the position into Friday would have surrendered most of the gain. Claude exited cleanly inside the Wednesday session, captured the full analysis-side R, and was flat on EURUSD before the headline tape arrived. That is what timing-aware conviction looks like. The next sections show the broker execution and the pre-trade analysis Claude actually produced.

Account Performance

Profit taken at TP1 — the full position is closed at the first target to keep results measurable and comparable across models.

InstrumentLotsEntry / ExitNet P&LRisk (1%)Balance

EURUSD-Pepperstonesell

13.061.17105 → 1.1679+$1,606.38$499.20

$51,550.09

EURUSD-Pepperstonesell

+$1,606.38

13.06 lots1.17105 → 1.1679

Risk: $499.20Bal: $51,550.09

Season$50,000.00 → $50,610.09+$610.09 (+1.2%)· 10 trades

AI Analysis Output

What the AI Actually Saw

Full 10.5k-character analysis produced by the AI before entering this trade. Unedited.

Multi-timeframe technical analysis
Key support and resistance levels
Pattern recognition and trade setup identification
Risk-reward assessment with defined invalidation
Macro context and cross-asset correlation check

SkyAnalyst AI

EURUSD NY AM Session Analysis — April 23, 2026

Current Price: ~1.17063 | Time: ~10:25 AM ET | Session: NY AM (active)

Step 1: Macro Regime Assessment

Indicator	Value	5D EMA	Trend	EURUSD Implication
DXY	98.581	98.343	Above EMA & rising (98.014 → 98.346 → 98.573 → 98.581)	Bearish EURUSD
US 10Y	4.290	4.289	Rising (4.258 → 4.299 → 4.305 → 4.290, near highs)	Bearish EURUSD
VIX	19.09	18.95	Slightly above EMA, marginally rising from 18.91 close	Mildly bearish EURUSD

DXY is the dominant driver. It is clearly above its 5-day EMA and has risen steadily over the past 4 sessions. This is the strongest bearish signal for EURUSD.

Macro Fundamental Context:

EUR PMIs disappointed badly — German Services 46.9 vs. 50.4 expected (massive miss), French Services 46.5 vs. 48.5. This is a significant growth shock for the Eurozone.
US PMIs beat — Manufacturing 54.0 vs. 52.5, Services 51.3 vs. 50.5. US data reinforces relative strength.
Unemployment Claims slightly soft (214K vs. 211K) but immaterial relative to PMI divergence.

Trend Agent Macro Note: "SUPPORTIVE — Macro backdrop favors USD strength: US flash PMIs beat while Eurozone services PMIs disappointed sharply, reinforcing growth divergence."

Macro Bias: BEARISH EURUSD | Confidence: ~70–75 (strong DXY trend + fundamental PMI divergence + yield support)

Hard Rule Check: Macro confidence is at/above 70, and DXY trend aligns bearish. Do not take long EURUSD trades regardless of technicals.

Step 2: Agent Alignment

Agent	Direction	Confidence	Notes
Macro (derived)	Bearish	~70–75%	DXY rising, yields supportive, PMI divergence
Trend Agent	Bearish	66%	Moderate strength, TRANSITIONING regime, reduced size recommended

Both agents agree directionally (bearish) with confidence ≥60. This is the strongest foundation scenario. However, the Trend Agent flags a transitioning regime (lower TFs showing bullish rebound) and recommends reduced size — this tempers aggressive entries.

Key Trend Agent Levels:

Resistance (Invalidation): 1.17133
Support: 1.16928
VWAP: 1.16991

Step 3: HTF Bias (60-Minute)

EMA Stack:

Price (1.17063) is now above the fast EMA (1.17029) on the latest candle but well below the slow EMA (1.17212).
Fast EMA < Slow EMA → Bearish stack intact (price is attempting a counter-trend bounce into resistance)

RSI: 49.1 — Recovering from 27.6 oversold but not yet bullish territory. Neutral, consistent with a bounce within a downtrend.

MACD: Histogram turned positive (+0.00014) for the first time — the bearish momentum is decelerating, but the MACD line remains well below zero (-0.00087). This signals a corrective bounce, not a trend reversal.

Price vs. Key Levels:

Below daily 5D EMA (1.1730) — bearish
Below yesterday's low (1.17069) for most of the session — bearish
Currently testing the Trend Agent resistance zone (1.17108–1.17133)
Today's high: 1.17120, London high: 1.17100, previous session high area: 1.17108–1.17133

HTF Bias: Bearish — EMA stack is bearish, price is bouncing into resistance from an oversold condition. The move up into 1.1706–1.1712 is a counter-trend retracement.

Step 4: Calendar Gate

Time (ET)	Event	Impact	Status
8:30 AM	Unemployment Claims	Medium	✅ Released — cleared (>90 min ago)
9:45 AM	US Flash PMIs	Medium	✅ Released — cleared (~40 min ago)

No high-impact events remaining in the next 30+ minutes. The calendar is clear for trade entries.

Calendar Gate: CLEAR ✅

Step 5: LTF Entry Analysis

15-Minute Timeframe

EMA: Price above fast EMA (1.16928) on the latest bar; fast EMA still below slow EMA but converging — trend direction still "below" but price breaking above
RSI: 62.4 — Above 50, confirming short-term bullish momentum. However, this works against a short entry; we need RSI to roll over for confirmation.
MACD: Histogram turning positive with medium strength — counter-trend momentum building

15-min assessment: Price is in an active counter-trend squeeze higher. For a short setup, we need to wait for this squeeze to exhaust near resistance and RSI to turn back below 60.

5-Minute Timeframe

Latest candles (14:20–14:25): Strong bullish push, RSI 67–68, MACD histogram strong. Volume spike detected on the 14:25 candle (124 ticks vs. 44.5 avg).
EMA: Bullish cross occurred at 14:10. Fast > Slow. Price well above both EMAs.
VWAP: Price broke above VWAP on the 5m (1.16948) and has extended to upper 1SD band.
Key resistance immediately ahead: 1.17072 (today's high), then 1.17108 (60m resistance), then 1.17133 (Trend Agent invalidation).

5-min assessment: Price is surging into a heavy resistance cluster (1.1708–1.17133). The 5-min RSI at 68.6 is nearing overbought. This is a potential short entry zone if price rejects this cluster — but not yet triggered.

Step 6: Confluence Scoring — Bearish Setup (Short at Resistance)

#	Confluence Factor	Status	Score
(a)	Macro Agent bias bearish, confidence ≥60	✅ Bearish, ~70–75%	1
(b)	Trend Agent bearish, confidence ≥60	✅ Bearish, 66%	1
(c)	DXY 5-day trend confirms (rising = short)	✅ Above 5D EMA, rising 4 consecutive days	1
(d)	10Y yield trend supports (rising = short)	✅ Rising from 4.258 to 4.290+, above 5D EMA	1
(e)	60-min EMA stack supports short	✅ Fast < Slow, price below slow EMA	1
(f)	Price at VWAP, session level, or Fib on 5m	⏳ Price approaching 1.1708–1.17133 resistance cluster (60m Fib 78.6%=1.17069, session high, Trend Agent R) — not yet touching/rejecting	0 (pending)
(g)	15-min RSI confirms (<50 for shorts, no extreme)	❌ RSI at 62.4, above 50 — currently favoring longs, not shorts	0
(h)	No high-impact event within 30 minutes	✅ Calendar clear	1

Current Score: 6 of 8 confirmed, with (f) pending trigger and (g) not yet aligned.

If price reaches the 1.17108–1.17133 zone and the 15-min RSI begins turning down (next 15-min close), this becomes 7 of 8 = High Confidence.

Step 7: Setup Construction

⚠️ Setup Status: CONDITIONAL — Awaiting Trigger

Price is moving toward the short zone but has not yet provided a rejection signal. The setup is staged, not live.

Setup: SHORT EURUSD at Resistance Rejection

Parameter	Value
Bias	BEARISH (Short)
Entry Zone	1.17080 – 1.17130 (Trend Agent R 1.17133, 60m resistance 1.17108, today's high 1.17120, Fib 78.6% 1.17069)
Entry Trigger	Rejection candle on 5m at or above 1.17080: bearish engulfing, pin bar/doji with upper wick, OR price stalls and 5m RSI turns below 60 after tagging zone. If price blows through 1.17133 and holds above on a 5m close, no entry — setup is invalidated.
Stop Loss	1.17180 (above Trend Agent invalidation at 1.17133 + ~5 pip buffer for slippage/automation)
Risk (Entry to Stop)	~10 pips from mid-entry (1.17080) to stop (1.17180) = 0.00100. 60m ATR = 0.001 → stop = 1x ATR ✅

Take Profit Levels:

Target	Level	Rationale	R:R
TP1	1.16980 (1R)	VWAP ~1.16991, 60m Fib 50% area, round number — natural first reaction zone	1:1
TP2	1.16928 (1.5R)	Trend Agent Support, 60m pivot low, prior session structure	1.5:1
TP3	1.16792 (2.9R)	Today's session low, London session low — full trend continuation target	2.9:1

TP1 Assessment: VWAP at 1.16991 provides structural support at approximately 1R. TP2 at 1.16928 (Trend Agent key support) is well-defined structure at 1.5R. The profile is valid — a close TP1 with a strong TP2 at 1.5R+ and TP3 near 3R. Minimum R:R threshold of 1.5:1 is met at TP2.

Volatility/Stop Assessment: VIX at 19.09 is mildly elevated but not extreme. Today's 60m ATR is ~10 pips (0.001). The range is not compressed (today's range already ~33 pips from high to low). Stop at 1x ATR beyond the resistance cluster is appropriate — neither too tight for noise nor too wide for R:R. The structural stop at 1.17180 is within the Trend Agent invalidation zone (above 1.17133) — this is valid per the framework.

Confluences Summary (at trigger)

✅ Confirmed	❌ / ⏳ Watch
Macro bias bearish (70–75%)	(f) Pending: price must reach and reject the 1.1708–1.1713 zone
Trend Agent bearish (66%)	(g) Pending: 15m RSI must turn back below ~55 to confirm
DXY above 5D EMA, rising
10Y yields rising, supportive
60m EMA stack bearish
Calendar clear

Projected Score at Trigger: 7/8 → High Confidence (7.5–8/10)

Risks

Risk	Mitigation
Counter-trend squeeze in progress — 5m and 15m momentum is bullish; breakout above 1.17133 would invalidate	Hard stop at 1.17180; do not enter without rejection candle confirmation
Trend Agent flags TRANSITIONING regime — recommends reduced size	Scale position to 0.5–0.75x normal risk per Trend Agent guidance
Low volume on the rally (5m volume declining except one spike)	Actually supportive of the short thesis — suggests the bounce lacks conviction
Post-PMI reaction may extend — US PMI beat may still be digesting	If price breaks and holds above 1.17133, the macro thesis needs reassessment
VWAP magnet — price above VWAP may pull back to it, but VWAP itself can act as support and reject shorts	Trail stop to breakeven once TP1 is reached; close partial position

Final Recommendation

CONDITIONAL SHORT EURUSD — Setup is staged at the 1.17080–1.17130 resistance cluster. Do not enter until a 5-minute rejection candle prints within or above this zone. If price breaks above 1.17133 on a 5m close, walk away — the setup is void.

Reduce position size to 0.5–0.75x standard risk given the transitioning regime. Manage TP1 at VWAP/1.16980, TP2 at 1.16928, TP3 at 1.16792.

If price does not reach the entry zone and instead rolls over from current levels (~1.1706), do not chase — the entry zone discipline is non-negotiable for an automated system.

What to Watch in Week 3

Week 3 starts Monday April 27 and runs through Friday May 1. The macro calendar carries three pivot points: the FOMC decision on Wednesday, advance Q1 GDP on Thursday, and the April nonfarm payrolls on Friday. Each is potentially regime-defining on its own. Stacked into a single week — and on top of an unresolved Trump-Iran headline tape — the setup is the most consequential calendar of the season so far.

The question for both models is regime adaptation. Week 2 demonstrated that the macro frame from Week 1 (disinflation, dollar weakness) no longer holds. The new frame is headline-driven, correlation-broken, and intraday whipsawy. A model that recognizes that and trades less should outperform a model that runs the Week-1 playbook into a Week-2 tape. Through two weeks, GPT has been more selective; Claude has been more active. Whether that posture inverts in Week 3 will tell us how each model is reading the regime.

Three specific watchpoints: First, does Claude continue to take indices long into Friday sessions, or does the Week-2 result cause a pullback? Second, does GPT take a trade Wednesday into the FOMC, or sit it out and trade only the post-decision tape? Third, do the models converge on a same-direction trade — and if they do, what is the instrument? Same-direction overlap has been absent across the first two weeks.

The benchmark itself remains intact. Two weeks down, both models within 2% of starting capital, no blow-ups, and a season scorecard that is actually getting interesting. Week 3 either separates the models or keeps the gap tight. Either outcome is reportable.

FAQ

Frequently Asked Questions

Who won Week 2 of the AI Trading Benchmark?: Neither. Both models closed Week 2 in the red. Claude finished at $50,610.09 (-$348.42 for the week), GPT at $49,257.72 (-$742.28). Claude had the smaller loss, but the headline of Week 2 is that both AIs survived the Trump-Iran Friday whipsaw without a blow-up.
What was the Trade of the Week?: Claude's EURUSD short on Wednesday April 23. Entry 1.17105, exit 1.1679, +1.67R analysis ride for +$1,606.38. The setup played a textbook dollar-firming thesis and the trade scaled all three take-profit targets inside a single six-hour European session. It was the only +1.5R+ print across both models all week.
How did the AI models handle the Trump-Iran Friday volatility?: Two trades total across both models on Friday April 24. Claude lost its US500 short to a stop (-$940). GPT took one trade — NAS100 long, scaled at TP1 for +$1,149.12, the only green print on the day. Combined Friday damage: +$209.12 — a small net positive for the benchmark on a session that chopped retail.
What is the AI Trading Benchmark season score after two weeks?: Through two weeks and 13 total trades, Claude leads on every aggregate stat. Claude: 10 trades, 50.0% win rate, +0.90 net R, $50,610.09 closing balance (+1.22% season return). GPT: 3 trades, 33.3% win rate, -0.98 net R, $49,257.72 closing balance (-1.48% season return). Claude leads by $1,352.37 in P&L.
Why did GPT trade so much less than Claude this week?: GPT took 3 trades, Claude took 3 — same count, different distribution. GPT concentrated two losses in a single Wednesday session, then sat out Thursday and took one trade Friday. Claude spread three trades across Wednesday and Friday. The cadence difference reflects GPT's tendency toward selectivity in chop regimes.
How is the AI Trading Benchmark methodology different from a backtest?: Every trade in the benchmark is real broker execution on a Pepperstone demo account. The models output entry, stop, and three take-profit targets. The broker fills the orders. The ledger records actual P&L. There is no curve-fitting, no forward-testing, no idealized fills — just two AI models trading the same instruments under the same risk framework, head-to-head.

Methodology

This weekly editorial aggregates trading results from April 20-24, 2026. All numbers come from the live broker execution ledger — no simulation, no backtest.

How P&L is computed. Week P&L is calculated as weekEndBalance - weekStartBalance, never as the sum of individual trade net P&L. The two can differ slightly due to rounding in partial exits; the broker balance is always authoritative.

Week rollover. Each week's starting balance is the previous week's ending balance. Week 1 uses the experiment's initial capital ($50,000 per model). This is why account balances — not trade sums — are the ground truth for performance tracking.

Net R vs. Net P&L. Net R is a risk-adjusted measure (sum of each trade's reward/risk multiple). Net P&L is the literal dollar change in account balance. Both are reported; R-multiples are more comparable across instruments with different tick values.

Weekend handling. Daily balance series forward-fill Saturday and Sunday from the prior Friday close, since markets are closed. This keeps chart visuals continuous without fabricating activity.

Methodology stability. Rules don't change mid-phase. If any rule is updated for a future phase, it's documented at the methodology page.

Scope refinement. This editorial was retroactively updated on 2026-05-12 to remove XAUUSD and USDJPY from the experiment universe.

View Full Methodology

Eduardo

Senior Research Editor

Two weeks down, both AIs still in the game, and the season is starting to look like it actually wants to test something interesting. Week 3 is the FOMC and NFP back-to-back. If the models can hold their lines through that calendar, the benchmark is officially past its proving phase. Watch the Wednesday and Friday tapes. — Eduardo, Senior Research Editor

Compare with Isaac’s analysis →