Abstract
A systems project testing whether AI agents can reliably check citation validity, code reliability, and sensitivity inside a reproducible research workflow, and where automation is safe versus where it introduces hidden risk. Findings are preliminary and depend on the specific systems studied; AI agents act as software roles under human approval, not as independent researchers.