Research

I am a DBA researcher and practitioner. My dissertation examines how modern engineering practices, including SRE, DevOps, cloud engineering, and automation, contribute to sustainable software engineering, operational efficiency, cost optimization, and long-term business value. The work is industry-oriented and evidence-driven. Everything below started as a problem observed inside real enterprise systems.

Evidence-Grounded Release Governance (EGRG)

In most large organizations, the decision to release software is a judgment call informed by partial evidence. The delivery pipeline knows whether tests passed. Monitoring knows how the system behaves in production. Quality records know how similar changes performed historically. SLO tracking knows how much error budget remains. These sources rarely meet in one place, and the decision itself usually leaves no auditable trace of what was known when it was made.

EGRG addresses this directly. It is a framework, with a published prototype, that links CI/CD signals, runtime telemetry, historical quality data, and SLO state into a single release-decision graph. Each release decision becomes a recorded, inspectable object: the evidence that supported it, the state of the system at the time, and the path from evidence to outcome. For regulated environments, this turns release governance from a documentation exercise into an evidence-grounded engineering practice.

The published EGRG prototype repository will be linked here. Until then, details are available on request.

The four-axis reliability model

Traditional reliability practice measures availability and stops there. That is no longer sufficient for systems that must also be affordable, accountable, and increasingly autonomous. The four-axis model widens the lens. Availability and resilience remain the first axis, the operational core. The second axis is sustainability: the energy, carbon, and cost footprint of keeping a system reliable. The third is governance: whether operational decisions are explainable and auditable. The fourth is intelligent-system risk: the new failure modes introduced when AI components participate in a system's behavior or its operations.

The model is a practical assessment instrument, not an abstraction. Its purpose is to give engineering leaders a defensible answer to a harder question than "is it up": is this system reliable in every sense the business and its regulators now care about.

AREBench: agent reliability engineering

As enterprises move LLM-based and agentic systems toward production, the reliability toolkit has to follow. AREBench is an Agent Reliability Engineering framework for evaluating the observability and operational reliability of LLM and AI systems: what should be measured, what failure looks like when behavior is probabilistic, and what production readiness means for a system that acts rather than only responds. This is active research with a practical goal, giving operations teams evidence-based instruments before autonomous systems arrive in their production environments.

Current research questions

These are open lines of inquiry, not settled positions.

What evidence should a release decision in a regulated enterprise be able to cite, and in what form should that evidence survive an audit?
How do SLOs and error budgets extend to agentic systems whose behavior is probabilistic rather than deterministic?
How should sustainability and cost signals enter reliability governance as first-class inputs rather than afterthoughts?
What does production readiness mean for intelligent operational tooling that takes actions without a human in the loop?
What should enterprise reliability teams measure today to prepare their infrastructure and governance for post-quantum transitions?