Prompt utilise pour regenerer cette page :
Page: Q-Learning
Description: "An agent learns to navigate a grid through reinforcement learning"
Category: artificial-intelligence
Icon: brain
Tags: reinforcement, q-learning, grid
Status: new
Front matter (index.md):
title: "Q-Learning"
description: "An agent learns to navigate a grid through reinforcement learning"
icon: "brain"
tags: ["reinforcement", "q-learning", "grid"]
status: ["new"]
HTML structure (index.md):
<section class="grid-container" id="grid-wrapper">
<canvas id="grid-canvas"></canvas>
</section>
Widget files:
- _stats.right.md (weight: 10): ##### Statistics
<dl class="qlearn-stats"> with:
- Episode: dd#stat-episode (initial "0")
- Steps: dd#stat-steps (initial "0")
- Reward: dd#stat-reward (initial "0")
- Epsilon: dd#stat-epsilon (initial "1.0")
- _controls.right.md (weight: 20): ##### Controls
<div class="qlearn-controls"> with:
{{< button id="btn-play" icon="play" aria="Play" class="is-start" >}}
{{< button id="btn-play" icon="pause" aria="Pause" class="is-stop" >}}
{{< button id="btn-step" icon="skip-forward" aria="Step" >}}
{{< button id="btn-reset-q" icon="refresh" aria="Reset Q-Table" >}}
Note: both play/pause buttons share id="btn-play"
Sliders:
- Speed: input#slider-speed type=range min=1 max=100 value=10
- Learning Rate (alpha): input#slider-alpha type=range min=0.01 max=1 step=0.01 value=0.1
- Discount (gamma): input#slider-gamma type=range min=0.1 max=0.99 step=0.01 value=0.9
- Epsilon (epsilon): input#slider-epsilon type=range min=0.1 max=1 step=0.01 value=1.0
Checkbox:
- input#check-decay checked: "Auto decay epsilon"
- _source.after.md (weight: 90): Explains Q-Learning algorithm: Q-Table, update rule Q(s,a)=Q(s,a)+alpha*(r+gamma*max(Q(s',a'))-Q(s,a)), epsilon-greedy, episodes from top-left to bottom-right goal. Click cells to edit: empty -> wall -> trap -> goal cycle.
Architecture (single file default.js):
IIFE, imports: panic from '/_lib/panic_v3.js'
Constants:
GRID_SIZE=8, MAX_STEPS_PER_EPISODE=200
ACTIONS: [{dx:0,dy:-1,label:'up-arrow'}, {dx:0,dy:1,label:'down-arrow'}, {dx:-1,dy:0,label:'left-arrow'}, {dx:1,dy:0,label:'right-arrow'}]
Cell types: CELL_EMPTY='empty', CELL_WALL='wall', CELL_TRAP='trap', CELL_GOAL='goal', CELL_START='start'
Rewards: REWARD_STEP=-0.1, REWARD_GOAL=10, REWARD_TRAP=-10
Heatmap colors: COLOR_NEGATIVE={r:255,g:68,b:68}, COLOR_NEUTRAL={r:255,g:255,b:255}, COLOR_POSITIVE={r:68,g:255,b:68}
CELL_CYCLE=[CELL_EMPTY, CELL_WALL, CELL_TRAP, CELL_GOAL]
Cell class:
constructor(type='empty'): stores type.
getReward(): goal=10, trap=-10, default=-0.1.
isTerminal(): true if goal or trap.
Grid class:
constructor(): creates 8x8 Cell array. Default layout:
Start: (0,0). Goal: (7,7).
Traps: (3,2), (5,4), (2,6), (6,1).
Walls: (1,1), (2,3), (3,3), (4,5), (5,5), (6,2).
getCell(x,y): returns Cell or null if out of bounds.
setCell(x,y,type): sets Cell at position.
isWall(x,y): true if null or wall.
Agent class:
constructor(startX=0, startY=0): stores start position and current position.
move(actionIndex, grid): computes new position from ACTIONS[actionIndex]. If wall/OOB, stays. Returns {x, y, moved}.
reset(): returns to start position.
QLearning class:
constructor(gridSize, numActions): creates 3D Q-table [y][x][action] initialized to 0.
_createTable(): returns 3D zero array.
chooseAction(x, y, epsilon): epsilon-greedy. Random < epsilon -> random action, else getBestAction.
update(x, y, action, reward, nextX, nextY, alpha, gamma, terminal): Bellman equation. currentQ + alpha * (reward + gamma * maxNextQ - currentQ). maxNextQ=0 if terminal.
getBestAction(x, y): argmax over Q-values for state.
getMaxQ(x, y): max Q-value for state.
reset(): reinitializes Q-table to zeros.
Renderer class:
constructor(canvas): gets 2d context, dpr. State: agentX/Y, pulse animation (pulseTimer, pulseX/Y, pulseType, pulseAlpha). Cached colors: background, border, wall, text, trapTint, goalTint, startTint, agent.
cacheColors(): reads --background-color-surface, --draw-color-surface, --text-color-secondary, --text-color-primary, --draw-color-primary from CSS.
initSize(): HiDPI support. Logical size = min(container.clientWidth, 494). Sets canvas buffer to logical*dpr, CSS display size to logical, ctx.setTransform(dpr).
width/height getters: canvas buffer / dpr.
cellSize getter: width / GRID_SIZE.
render(grid, qLearning, agent): clears canvas with border color. Finds maxAbsQ for normalization. Draws each cell: _drawCell (heatmap bg) + _drawArrow (best action). Then pulse overlay, then agent circle.
renderAgent/renderQValues/renderArrows: no-op API compat methods, render() handles all.
pulseCell(x, y, type): sets pulse state, alpha=0.6, clears after 200ms timeout.
updateCellType(x, y, type): no-op (grid data is source of truth).
eventToGrid(event): converts click to grid coords via getBoundingClientRect / cellSize.
_drawCell(ctx, cx, cy, cw, ch, cell, qLearning, gx, gy, maxAbsQ): walls get wall color. Others: heatmap from normalized Q (maxQ/maxAbsQ clamped -1..1). Then tint overlay for trap (red), goal (green), start (blue).
_drawArrow(ctx, cx, cy, cw, ch, cell, qLearning, gx, gy): skips walls/goal/trap. Skips if all Q=0. Draws Unicode arrow for best action. fontSize = max(12, cw*0.35), alpha 0.7.
_drawPulse(ctx, size, gap): colored flash overlay (green for goal, red for trap) with pulseAlpha.
_drawAgent(ctx, agent, size): circle at cell center, radius=6, agent color.
_getMaxAbsQ(qLearning, grid): scans all non-wall cells, returns max absolute Q (min 1).
_interpolateColor(value): value in [-1,1]. Negative: red->white. Positive: white->green. Returns rgb() string.
Simulation state: grid, agent, qLearning, renderer. episodeCount, stepCount, totalReward. Parameters: speed=10, alpha=0.1, gamma=0.9, epsilon=1.0, decayEnabled=true. isRunning, timerId.
Simulation logic:
step(): chooseAction(epsilon-greedy), move agent, get reward, check terminal (goal/trap/maxSteps), update Q-table, pulse on terminal, render. If terminal: endEpisode(). Returns boolean.
endEpisode(): increments episodeCount. Epsilon decay: epsilon *= 0.995 (min 0.01) if decayEnabled. Updates slider display. Resets agent/stepCount/totalReward. Renders.
runSimulation(): stepsPerTick = max(1, floor(speed/10)). Runs steps, schedules setTimeout with interval = max(10, 1000/speed).
togglePlay(): toggles isRunning, .is-running on .qlearn-controls. Starts/stops simulation.
singleStep(): only when paused. Calls step() + updateStats().
resetQTable(): stops if running. Resets Q-table, agent, counters. Reads epsilon from slider.
UI helpers:
updateStats(): updates stat-episode, stat-steps, stat-reward (toFixed(1)), stat-epsilon (toFixed(3)).
updateSliderDisplay(sliderId, value): sets slider.value and #sliderId-value textContent.
handleCellClick(event): only when paused. eventToGrid, bounds check, protect (0,0). Cycles type through CELL_CYCLE. Updates grid, renders.
bindSliders(): input events for slider-speed/alpha/gamma/epsilon + check-decay checkbox.
bindControls(): click events for btn-play (togglePlay), btn-step (singleStep), btn-reset-q (resetQTable).
Initialization:
init(): gets grid-canvas. Creates Grid, Agent(0,0), QLearning(8,4), Renderer(canvas). cacheColors, initSize, render. Binds controls + sliders + canvas click. MutationObserver on documentElement data-theme attribute for recoloring. updateStats.
Auto-init: readyState check, DOMContentLoaded fallback.
SCSS file (default.scss):
.grid-container: flex, justify center, max-width 100%
#grid-canvas: cursor pointer
layout-main scope:
.qlearn-stats: flex, justify center, gap 2rem, flex-wrap
.stat: flex column centered, gap 0.25rem
.label: 0.75rem, uppercase, muted
.value: 1.5rem, weight 600, tabular-nums
.qlearn-controls: flex row nowrap, justify center, gap 0.5rem
.is-start: display block (visible)
.is-stop: display none (hidden)
&.is-running: .is-start none, .is-stop block
.slider-group: flex row, align center, gap 0.5rem
label: 0.85rem, primary text
input[type="range"]: accent-color primary
.check-group: flex row, align center, gap 0.5rem
checkbox: accent-color primary, pointer
label: 0.85rem, primary text, pointer
Page entierement generee et maintenue par IA, sans intervention humaine.