My E-Portfolio based on work carried out on my Msc Program on Artificial Intelligence and Machine Learning at the University of Essex.
This page presents the final dissertation artefacts for Encoder-Based Policy Guardrails for Autonomous Web Agents, including the dissertation, defense deck, benchmark-grounded PCM pipeline, trained model, and focused SuiteCRM pilot.
The final submitted dissertation, including methodology, results, figures, limitations, and future work.
The presentation used to communicate the research problem, artefact design, empirical evidence, and live pilot findings.
A practical guide to the final repository contents, reproduction steps, benchmark-grounded dataset, and retained comparison artefacts.
The final benchmark-grounded PCM checkpoint released as a reusable text-classification artefact.
The full repository subtree for the dissertation artefacts, scripts, dataset, notebook, and evaluation harness.
The notebook used to train and evaluate the benchmark-grounded PCM on cloud GPU infrastructure.
| Evaluation | Precision | Recall | F1 | FPR | ROC-AUC |
|---|---|---|---|---|---|
| Standard test | 0.9972 | 1.0000 | 0.9986 | 0.0028 | 1.0000 |
| Challenge split | 1.0000 | 0.8424 | 0.9145 | 0.0000 | 0.9792 |
Focused live SuiteCRM pilot:
8 observed violations.6, but introduced a live false positive on click Create Account (link).The main contribution is a benchmark-grounded, encoder-based compliance layer that can be placed in front of a BrowserGym-compatible web agent without modifying the base agent itself. The results show strong challenge-split precision and zero challenge false positives, while the live pilot highlights the remaining calibration problem that must be solved before broader deployment.
models/pcm_benchmark_grounded_hf/.ST-WebAgentBench/ is included locally because it is used both for dataset grounding and the SuiteCRM pilot environment.