Why Code Review Matters#
Code review catches bugs, but that is its least important function. Reviews spread knowledge across the team, enforce consistency, and create a record of design decisions. A codebase where every change passes through at least one other set of eyes develops fewer dark corners that only one person understands.
The cost is time. A poorly structured review process adds days to every change. The goal is to make reviews fast, focused, and useful – not ceremonial.
The Review Checklist#
A consistent checklist prevents the two failure modes of reviews: rubber-stamping (approving without reading) and bikeshedding (debating trivia while missing real issues).
Correctness
- Does the code do what the PR description says it does?
- Are edge cases handled (empty inputs, nulls, boundary values, concurrent access)?
- Are error paths handled? Does the code fail gracefully or silently swallow errors?
Security
- Is user input validated and sanitized?
- Are secrets hardcoded or properly externalized?
- Does the change introduce new attack surface (new endpoints, new file uploads, new deserialization)?
- Are authorization checks present on new endpoints?
Design
- Is the change in the right place architecturally? Does it respect existing module boundaries?
- Are new abstractions justified or is this over-engineering?
- Could the change be smaller without losing coherence?
Testing
- Are there tests? Do they test behavior, not implementation details?
- Do the tests cover the failure cases, not just the happy path?
- Are flaky test patterns avoided (sleeping, hardcoded ports, time-dependent assertions)?
Readability
- Would a new team member understand this code in six months?
- Are variable and function names descriptive?
- Are comments explaining “why”, not “what”?
Operational Impact
- Does this change affect deployment (new environment variables, new infrastructure, database migrations)?
- Is there a rollback plan if this breaks in production?
- Does it affect observability (new metrics, changed log formats, altered traces)?
Keep the checklist in a shared document or embed it as a PR template:
<!-- .github/PULL_REQUEST_TEMPLATE.md -->
## What does this change do?
## How was it tested?
## Review checklist
- [ ] Correctness: edge cases and error handling
- [ ] Security: input validation, secrets, authz
- [ ] Tests: behavior-based, failure cases covered
- [ ] Operational: deployment impact documentedPR Size Guidelines#
The single most impactful thing you can do for review quality is keep PRs small. Research from Google and Microsoft consistently shows that review effectiveness drops sharply above 400 lines of change.
Sizing targets:
- Under 200 lines: Ideal. Reviewer can hold the entire change in their head.
- 200-400 lines: Acceptable. Requires focused attention.
- 400-800 lines: Too large for effective review. Split if possible.
- Over 800 lines: Almost certainly should be multiple PRs.
These counts exclude generated files, lock files, and test fixtures. A 600-line PR where 400 lines are test data is effectively 200 lines of review.
Strategies for keeping PRs small:
- Separate refactoring from feature work. A PR that renames a function across 50 files AND adds a new feature is two PRs.
- Use feature flags to merge incomplete features safely. Ship the backend in one PR, the frontend in another.
- Stack PRs: create a chain where each PR builds on the previous. Tools like
ghstack,spr, and Graphite manage stacked PRs. Each PR is small and reviewable; together they form the full feature. - Extract mechanical changes: dependency updates, code formatting, import reorganization – each gets its own PR that reviewers can approve quickly.
Review Automation#
Automate everything that does not require human judgment. Humans should review design and logic; machines should check formatting, linting, and policy.
Linters in CI: Run linting on every PR. Block merging if linting fails. This eliminates entire categories of review comments.
# .github/workflows/lint.yml
name: Lint
on: [pull_request]
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: pip install ruff
- run: ruff check .
- run: ruff format --check .CODEOWNERS: Automatically assign reviewers based on which files changed. This ensures the right people see the right changes.
# .github/CODEOWNERS
# Default: platform team reviews everything
* @org/platform-team
# Service-specific ownership
/services/auth/ @org/auth-team
/services/billing/ @org/billing-team
# Infrastructure: ops team must review
/terraform/ @org/ops-team
/helm/ @org/ops-team
/.github/workflows/ @org/ops-team
# API contracts: require API steward
/api/openapi.yaml @api-steward
/proto/ @api-stewardCODEOWNERS requires that at least one person from the matching group approves the PR before it can merge. Use branch protection rules to enforce this.
Danger JS / Danger Ruby: Programmable PR review automation. Add warnings for common issues:
// dangerfile.js
const { danger, warn, fail } = require('danger');
// Warn on large PRs
const linesChanged = danger.github.pr.additions + danger.github.pr.deletions;
if (linesChanged > 500) {
warn(`This PR changes ${linesChanged} lines. Consider splitting it.`);
}
// Require description
if (danger.github.pr.body.length < 20) {
fail('PR description is too short. Explain what this change does and why.');
}
// Flag changes to critical files
const criticalFiles = danger.git.modified_files.filter(f =>
f.includes('schema/') || f.includes('migration')
);
if (criticalFiles.length > 0) {
warn(`This PR modifies database schema: ${criticalFiles.join(', ')}`);
}Semgrep: Custom static analysis rules that encode your team’s patterns:
# .semgrep/rules.yml
rules:
- id: no-raw-sql
patterns:
- pattern: cursor.execute($QUERY, ...)
- pattern-not: cursor.execute($QUERY, $PARAMS)
message: "Use parameterized queries to prevent SQL injection"
severity: ERROR
languages: [python]Review Comment Best Practices#
Review comments should be clear about whether they are blocking, optional, or informational. A simple prefix convention works:
- blocker: – Must be addressed before merge. “blocker: This SQL query is vulnerable to injection.”
- nit: – Minor style or preference issue. Not blocking. “nit: I’d name this
user_countinstead ofcnt.” - question: – Asking for clarification, not requesting a change. “question: Is there a reason we’re not using the existing
UserServicehere?” - suggestion: – Optional alternative approach. “suggestion:
itertools.groupbymight simplify this loop.”
Rules for reviewers:
- Explain the “why” behind requested changes. “Move this to a helper function” is less useful than “This validation logic appears in three endpoints; extracting it prevents drift.”
- Suggest concrete alternatives. Do not just say “this could be better.” Show what “better” looks like.
- Distinguish personal preference from team convention. If the style guide does not mandate it, it is a “nit:” at most.
- Review the code, not the author. “This function does too much” is fine. “You always write functions that do too much” is not.
Rules for authors:
- Respond to every comment, even if only with “Done.” Reviewers should not have to check whether their feedback was addressed.
- If you disagree, explain your reasoning. “I considered that but chose this approach because X.” This is a conversation, not a command chain.
- Do not take review comments personally. The reviewer is improving the code, not criticizing you.
Review Metrics#
Track metrics to identify process problems, not to evaluate individuals. Useful metrics:
- Time to first review: How long a PR waits before anyone looks at it. Target: under 4 hours during business hours. If PRs sit for days, reviewers are overloaded or the assignment process is broken.
- Time to merge: Total time from PR creation to merge. This includes review rounds, CI time, and rework. Track the median, not the mean – outliers distort the mean.
- Review rounds: How many back-and-forth cycles before approval. More than 2 rounds usually means the PR was too large or the author and reviewer had different assumptions. One round is ideal.
- PR size distribution: Plot the distribution of PR sizes. If the median is above 400 lines, the team needs to work on splitting changes.
Do not use these metrics for individual performance reviews. The moment you reward fast review times, people rubber-stamp. The moment you penalize high comment counts, people stop providing feedback.
Handling Disagreements#
Disagreements in code review are healthy up to a point. Two patterns prevent them from becoming destructive.
The Two-Round Rule: After two rounds of disagreement on the same point, escalate to a synchronous conversation (call, meeting, or pairing session). Text-based back-and-forth is terrible for resolving nuanced design disagreements. Tone is lost, positions harden, and threads grow endlessly.
The Decision Record: When a disagreement reveals a genuine architectural question (not just style), resolve it by writing an ADR (Architecture Decision Record) or adding the decision to a team conventions document. This prevents the same argument from recurring in future reviews.
The Tiebreaker: If synchronous discussion does not resolve the disagreement, the code author gets the final call – with one exception. If the reviewer has identified a correctness or security issue, the reviewer’s concern must be addressed. “I prefer a different design” is the author’s call. “This has a race condition” is not.
In practice, most disagreements dissolve when both parties understand the other’s reasoning. The problem is usually insufficient context, not genuine incompatibility.