MSc Computing Project October 2025 A

This page presents the final dissertation artefacts for Encoder-Based Policy Guardrails for Autonomous Web Agents, including the dissertation, defense deck, benchmark-grounded PCM pipeline, trained model, and focused SuiteCRM pilot.

Key Artefacts

Dissertation PDF

The final submitted dissertation, including methodology, results, figures, limitations, and future work.

Defense Deck

The presentation used to communicate the research problem, artefact design, empirical evidence, and live pilot findings.

Project README

A practical guide to the final repository contents, reproduction steps, benchmark-grounded dataset, and retained comparison artefacts.

Hugging Face Model

The final benchmark-grounded PCM checkpoint released as a reusable text-classification artefact.

GitHub Repository

The full repository subtree for the dissertation artefacts, scripts, dataset, notebook, and evaluation harness.

Training Notebook

The notebook used to train and evaluate the benchmark-grounded PCM on cloud GPU infrastructure.

Final Results

Evaluation	Precision	Recall	F1	FPR	ROC-AUC
Standard test	0.9972	1.0000	0.9986	0.0028	1.0000
Challenge split	1.0000	0.8424	0.9145	0.0000	0.9792

Focused live SuiteCRM pilot:

Baseline agent completed the task with 8 observed violations.
PCM reduced observed violations to 6, but introduced a live false positive on click Create Account (link).

What Can Be Browsed Here

Research Contribution

The main contribution is a benchmark-grounded, encoder-based compliance layer that can be placed in front of a BrowserGym-compatible web agent without modifying the base agent itself. The results show strong challenge-split precision and zero challenge false positives, while the live pilot highlights the remaining calibration problem that must be solved before broader deployment.

Notes

The final dissertation checkpoint is stored locally in models/pcm_benchmark_grounded_hf/.
The repository also retains older model and evaluation folders for comparison.
ST-WebAgentBench/ is included locally because it is used both for dataset grounding and the SuiteCRM pilot environment.

← Back to Home

Abdulhakim Bashir