Run training and watch jobs
Studio’s Home page (#/) is split into two columns: Run training on the left, the Jobs list on the right. Clicking a job in the list navigates to Job detail (#/jobs/:id), which is where the live event stream and the loss chart live.
Run training
The Run training panel calls /api/manifest once when the page loads. The response is the project’s createArkor({ trainer }) summary, which the panel uses to label the action:
Run training: <trainer name> once a trainer is found.
No trainer in src/arkor/index.ts yet. Add createTrainer(...) and pass it to createArkor. if the bundle imported but exposed nothing.
Couldn't read manifest: <error> if the build itself failed (typo in src/arkor/, etc.).
The button is disabled while a run is in flight and while no trainer has resolved.
When you click it, Studio sends POST /api/train. The backend spawns arkor start in a subprocess and streams its stdout / stderr back as raw text. The pre-formatted log box auto-scrolls; what you see is exactly what the spawned arkor start would print in a terminal.
There is no input form for picking the trainer or passing flags: Studio always runs the trainer registered through createArkor, and arkor start reuses .arkor/build/index.mjs if it already exists. Edits to src/arkor/ are not picked up automatically across multiple clicks on the same page; reload the Run training page (or run arkor build from a terminal) between edits and the next click. See CLI § build / start for the precise rebuild rules.
What “first run” looks like
Click Run training. Two phases follow:
- GPU allocation. The job appears as
Warming up GPU. The loss chart shows a Waiting for GPU placeholder, the events list is empty, and the Phase row in the metadata sidebar reads Warming up GPU while the GPU warm-up timer ticks. This phase varies in length: typically under a minute when a worker is still warm from a recent job, occasionally several minutes when one has to start from cold. See the Quickstart for why this happens.
- The training run. When
training.started arrives, the status flips to Running, the loss chart starts updating from training.log frames, and the Phase row reads Training run. This is the 7 to 12 minute window in the template table on the Quickstart.
Click Run training with the browser tab focused. Studio asks the browser for notification permission and surfaces a desktop toast plus a tab-title indicator when the job completes or fails, so you do not have to watch the tab.
Jobs list
The Jobs list polls GET /api/jobs once at mount, then every 5 seconds. There is no manual refresh button; the interval is fixed.
| Column | Source |
|---|
| Status | Job.status (queued / running / completed / failed / cancelled) plus one Studio-derived display state on the list: a queued job whose createdAt is within the last 90 seconds renders as Warming up GPU. The list page only polls /api/jobs; Job detail uses the same rule and additionally consults the live SSE stream. The cell carries a CSS class for colouring. |
| Name | Job.name. Links to #/jobs/<id>. |
| Created | new Date(Job.createdAt).toLocaleString(). |
| ID | Job.id, monospaced. |
The list shows whatever order the backend returned. There is no client-side filter, search, or pagination. When the project has no jobs yet, the panel reads No jobs yet..
Job detail
#/jobs/:id opens a Server-Sent Events connection to GET /api/jobs/:id/events via EventSource. The page listens for five named events plus a stream sentinel:
In every row below, the event log renders the event name in its own column next to the message; the message text shown is what JobDetail.pushEvent() builds from the SSE payload.
| Event | Effect on the page |
|---|
training.started | Status flips to running. The event log row’s message is the raw JSON payload (no per-event formatting). |
training.log | The step, loss, and (when present) evalLoss are appended to the loss chart’s data array. The event log message starts with step=<n> and appends loss=<value> and/or evalLoss=<value> for whichever fields are numeric on that step — either segment is omitted when its field is null/missing, so eval-only frames render as step=<n> evalLoss=<value>. |
checkpoint.saved | The event log message is step=<n>. The chart is not affected. |
training.completed | Status flips to completed. The event log message is <n> artifact[s]; the artifact count itself is rendered in the Metadata sidebar’s Artifacts row. |
training.failed | Status flips to failed. The event log message is the error string from the payload, and a red banner with the same text is rendered above the chart. |
end | The page closes the EventSource. No reconnect. |
Stream errors are surfaced as a separate Event stream interrupted. banner above the loss-chart and events-log cards (not as a log entry); the banner clears the next time an SSE frame arrives, and reconnect is left to the browser’s EventSource retry behaviour.
The event log keeps only the last 500 entries (older entries drop off as new ones arrive). It is a scrolling list rendered from named SSE events, intended for quick inspection rather than full forensic logs; for the complete history, look at the cloud-api directly.
The loss chart is an SVG plot drawn from training.log events. It uses min-max scaling on the y-axis and the step number on the x-axis, and shows up to two series:
- Training loss — solid teal line, one vertex per event with a numeric
loss.
- Eval loss — dashed pink line with point markers, drawn from events that carry a numeric
evalLoss (typically every evalSteps ticks). The series is built from the events directly, so eval-only frames (numeric evalLoss with loss omitted) still appear in the line, legend, and stats. The legend hides this entry until at least one eval point arrives.
Hovering shows the nearest step and whichever of loss / evalLoss are present at that step (eval-only steps don’t show a loss value, and vice-versa). The chart shows the Waiting for training.log events… placeholder until at least one event with a numeric loss or evalLoss arrives — training.log frames where both fields are null/omitted don’t count.
Advanced metrics
The Advanced toggle in the chart’s header reveals a per-series statistics panel. Each card reports:
- Mean loss ± 95% CI — sample mean of the loss values together with the half-width of the 95% confidence interval (Student’s t-distribution; falls back to z = 1.96 for n > 31).
- Std dev and Variance — Bessel-corrected sample estimates (
ddof=1).
- p90 and p95 — linearly interpolated percentiles, matching numpy’s default convention.
The eval card stays empty until a training.log event with a numeric evalLoss arrives.
Things this page does not do
- No cancel button. To stop a running job, call
trainer.cancel() from your own code that drives the trainer. Studio does not expose this in the UI today.
- No artifact browser. The page reports the artifact count for completed jobs but does not list or link to individual artifacts. For full artifact access, use the cloud-api or the SDK’s
onCompleted({ artifacts }) callback during the run.
- No mid-run inference. The Playground is for completed jobs only (see Playground). For live inspection during a run, use the SDK’s
onCheckpoint({ infer }).