Supervisor Cognitive Load Theory v1.0.
Complete web publication of SCLT: a full extension of Cognitive Load Theory for AI-agent supervision with framework, validation design, instrument requirements, and strategic program sequencing.
Internal working brief · Suitable for advisor circulation · Version 1.0 (April 2026) · Eindhoven
Executive summary
SCLT argues that classical CLT under-specifies the cognitive reality of supervising AI agent crews. Supervisors are not primarily learning domain schemas. They are calibrating trust, sampling outputs, integrating heterogenous deliverables, and deciding intervention timing under uncertainty.
- SCLT keeps CLT's working-memory-bounded architecture and element interactivity foundation.
- SCLT reframes inherited categories and introduces supervisor-specific loads.
- SCLT proposes a factorial validation experiment, an extended NASA-TLX-S instrument, and a two-paper publication path.
- Commercial value is indirect: research credibility, IP defensibility, and academic anchoring of the Bandwidth Gap thesis.
1. Theoretical framework
1.1 CLT baseline and transfer limits
SCLT adopts post-2019 CLT foundations: capacity-limited working memory, additive load dynamics, and element interactivity as the core load driver. It flags two non-clean transfers to supervision contexts: schema acquisition as primary goal and germane load as a separate practical category.
1.2 Variable transfer map
Working memory and element interactivity transfer directly. Intrinsic and extraneous load are reformulated as task-intrinsic and interface-extraneous variants in supervision contexts. Germane load is replaced with operational constructs.
1.3 New supervisor-specific load categories
- Trust Calibration Load (TCL): cognitive cost of updating per-agent reliability priors.
- Oversight Load (OL): cognitive cost of sampling and evaluating outputs.
- Integration Load (IL): cognitive cost of reconciling heterogenous outputs.
- Intervention Decision Load (IDL): cognitive cost of deciding when/how to intervene.
1.4 Rejected extra categories
Context switching, provenance tracking, and authority load are treated as components/modifiers within TCL, OL, IL, and IDL rather than standalone constructs to preserve parsimony.
1.5 Formal relationship
SCLT is expressed as additive composition of inherited and supervisor-specific loads, bounded by working-memory capacity. Performance degrades when the combined load crosses capacity.
Source form: SCL(t) = TIL + IEL + TCL(N, sigma_R, nu) + OL(s, c, e) + IL(h, d) + IDL(alpha, k, gamma).
1.6 Distinctive predictions
SCLT predicts an inverted-U relation between performance and number of supervised agents, trust-transition error clustering, and a skill trajectory where calibration proficiency grows while hands-on domain skill may stagnate.
1.7 Anchor diagram
The source defines a single visual anchor: bounded WM core with six load streams, performance output curve, and feedback loops from intervention outcomes and saturation effects.
Back to top2. Precedent integration
The source positions SCLT as complementary to Multiple Resource Theory, Levels of Automation taxonomies, and Situation Awareness models, while operationalizing cognitive-cost mechanisms missing from those frameworks.
- Subsumes selected supervisory-control claims by providing measurable load constructs.
- Complements existing resource and automation frameworks at a different analysis level.
- Reframes rather than directly contradicts modern CLT revisions around germane load.
- Extends single-decision human-AI teaming studies to portfolio-level multi-agent supervision.
3. Validation experiment design
3.1 Paradigm
Within-subject 2x2x2x2 design over reliability variance, required sampling rate, output heterogeneity, and task ambiguity. Participants supervise four-agent crews across all 16 cells.
3.2-3.3 Task domain and manipulations
Document review/synthesis tasks are used because they mirror current agent workflows and allow controllable heterogeneity and dependency conditions. Pilot calibration ensures detectable subscale shifts without floor/ceiling effects.
3.4 Dependent measures
- Performance: expert-rated final quality, error profiles, completion time.
- Subjective load: NASA-TLX-S 10-subscale variant.
- Calibration: per-agent trust ratings and Brier scores.
- Behavior: sampling rate, intervention rate, decision latency.
- Optional physiology: HRV and pupillometry (standard tier+).
3.5 Analytic plan
Mixed-effects primary models with interaction testing and mediation analyses matching each manipulation to targeted load constructs for discriminant validity.
3.6 Pre-registration and open science
All hypotheses and analyses are pre-registered before data collection; materials, anonymized data, and code are published after study completion.
Back to top4. Instrument requirements
4.1 NASA-TLX-S
The source retains six NASA-TLX core subscales and adds four supervisor-specific subscales aligned to TCL, OL, IL, and IDL.
- Trust Calibration Demand.
- Oversight Demand.
- Integration Demand.
- Intervention Decision Demand.
4.2 Validation cycle
Draft instrument, cognitive interview pilot, refinement, main validation, confirmatory factor analysis, then v1 publication. Fit criteria in source: CFI > .95, RMSEA < .06, SRMR < .08.
Back to top5. Participant population and N
Target population is knowledge workers with sustained AI-tool use across technical, analytical, and creative segments.
- Recommended main sample N=80 for mediation and discriminant-validity goals.
- In-lab subset n=20 supports physiological convergence checks.
- Larger N for moderator mapping is deferred to follow-up studies.
6. Publication strategy
The source proposes a two-paper track: theory-first then empirical validation.
- Theoretical paper target: Educational Psychology Review.
- Empirical paper target: Journal of Experimental Psychology: Applied.
- Cross-citations with Bandwidth Gap, supervisor instrument, and babysitting-tax outputs.
- Open science commitments: OSF preregistration, open instrument, open code/data, PsyArXiv preprint.
7. Strategic positioning
The brief includes phased engagement strategy with CLT and human-factors networks, starting with extension framing and feedback requests before deeper collaboration attempts.
- Priority outreach path includes active CLT collaborators in the Netherlands ecosystem.
- Dual-track endorsement strategy reduces single-community gatekeeping risk.
- Founder PhD path is framed as optional and sequenced after theoretical-paper momentum.
8. Timeline and budget
The source details a 24-month roadmap across framework drafting, theoretical submission, pilot/instrument refinement, main data collection, analysis, and empirical submission.
- Lean tier: approximately €80k-€120k.
- Standard tier (recommended): approximately €180k-€250k.
- Premium tier: approximately €350k-€500k.
The brief also includes WBSO and Innovation Box alignment guidance for Dutch IP substance and R&D documentation.
Back to top9. Risks and open questions
9.1 Theoretical risks
Potential overlap criticism with supervisory-control and MRT frameworks is anticipated and addressed by mechanism-level distinction and predictive specificity.
9.2 Empirical risks
Key risks include insufficient discriminant validity, underpowered effects, and overreach from a single experiment. Mitigations include pilot refinement and multi-study follow-up planning.
9.3 Strategic risks
Gatekeeping, endorsement uncertainty, and credential perception risks are handled through open artifacts and multi-node collaboration strategy.
9.4 Open design questions
- Whether germane load is eliminated or reframed as long-timescale calibration schema.
- IDL relation to SA level-3 projection constructs.
- Single-resource vs multi-resource capacity model.
- Best baseline condition (solo, single-agent, or human-team supervision).
- Trust-calibration accuracy operationalization in ambiguous tasks.
10. Sequencing and coordination
SCLT is sequenced as a Year-2 capstone linked to prior instrument and babysitting-tax work. The source mandates shared ownership of NASA-TLX-S identity across tracks and sequencing so theory cites empirical precursor outputs where possible.
Back to top11. Final recommendation
Run SCLT at standard tier over 24 months, submit theory first, execute N=80 validation with co-developed NASA-TLX-S, then submit empirical results. Position SCLT as research backbone and credibility engine rather than short-term revenue driver.
Back to topSelected references
Core references in the source include Sweller (1988), Sweller et al. (1998; 2019), Wickens (2008), Endsley (1995), Parasuraman et al. (2000), Sheridan (1992), Mosier & Skitka (1996), Bansal et al. (2019), Lai et al. (2021), Vereschak et al. (2021), and Fritz & MacKinnon (2007).
Full reference expansion remains part of the drafting package described in the source document.
Back to top