Accessibility Testing and Tooling Workflow

A practical workflow for automated and manual accessibility testing, covering tool selection, CI/CD integration, and testing strategies. Automated testing catches approximately 57% of accessibility issues (Deque, 2021)—the remaining 43% requires keyboard navigation testing, screen reader verification, and subjective judgment about content quality. This guide covers how to build a testing strategy that maximizes automated coverage while establishing the manual testing practices that no tool can replace.

Accessibility testing workflow: automated tools catch structural issues early; manual testing catches semantic and experiential issues that require human judgment

Abstract

Accessibility testing requires a layered approach because different issue categories require different detection methods:

Testing Layer	What It Catches	Coverage
Static analysis (eslint-plugin-jsx-a11y)	Missing alt attributes, invalid ARIA, semantic violations in JSX	~15% of criteria, development-time
Runtime automation (axe-core, Pa11y)	Contrast ratios, duplicate IDs, missing labels, ARIA state validity	~35% of WCAG criteria reliably
Manual testing (keyboard, screen readers)	Focus order logic, content meaning, navigation consistency	~42% of criteria—non-automatable

Why automation alone fails: WCAG 2.2’s 86 success criteria include subjective requirements—whether alt text accurately describes an image, whether error messages provide helpful guidance, whether focus order is logically intuitive. Tools can detect presence of alt text but cannot evaluate its correctness.

Tool selection principle: axe-core dominates because of its conservative rule engineering (minimizes false positives), making it safe for CI/CD gates. Pa11y adds HTML CodeSniffer’s distinct rule set for broader coverage. WAVE and Lighthouse serve quick audits but lack the rigor for compliance verification.

Testing workflow design:

Shift-left: eslint-plugin-jsx-a11y catches issues in IDE before code commits
Component/E2E: axe-core integration in Playwright/Cypress catches runtime issues
CI gates: Pa11y-CI fails builds on critical violations
Manual protocol: Keyboard + screen reader testing before each release

Automation Coverage: What Tools Actually Catch

The 57% figure from Deque’s study (13,000+ pages, 300,000+ issues) measures issue volume, not criteria count. Some issue types (missing labels, contrast failures) occur frequently—automation catches these reliably. Other criteria (meaningful sequence, focus order) rarely produce automatable signals.

WCAG Criteria by Automation Reliability

Reliability	Criteria Count	Examples	Detection Confidence
High	~13%	Color contrast ratios, duplicate IDs, missing form labels	Measurable technical requirements; minimal false positives
Partial	~45%	Heading hierarchy, link purpose, error identification	Detect presence but not quality/correctness
None	~42%	Alt text accuracy, focus order logic, caption timing	Require human judgment

What “high reliability” means: Contrast ratio calculations are objective—4.5:1 for normal text, 3:1 for large text per WCAG 1.4.3. Tools calculate this deterministically. “High reliability” criteria have clear pass/fail thresholds without subjective interpretation.

What “partial” means: A tool can verify a heading exists after content but cannot determine if the heading accurately describes that content. It detects structural presence, not semantic correctness.

What “none” means: “Focus order preserves meaning and operability” (WCAG 2.4.3) requires understanding user intent and page purpose. No algorithm can determine if tab order is “logical” without understanding the content’s meaning.

Why axe-core’s 57% Matters for CI/CD

axe-core’s design philosophy prioritizes zero false positives over maximum coverage. From the axe-core documentation:

“Axe-core is designed to report only issues we’re confident are accessibility issues. We’d rather miss an issue than report a false positive.”

This makes axe-core safe for CI/CD gates—builds won’t fail for phantom issues. The trade-off: axe-core’s “incomplete” results (issues needing human review) are often ignored in CI pipelines, missing partial-detection opportunities.

Configuration for WCAG 2.2 AA compliance:

1
const axeConfig = {
2
  runOnly: {
3
    type: "tag",
4
    values: ["wcag2a", "wcag2aa", "wcag21a", "wcag21aa", "wcag22aa"],
5
  },
6
  rules: {
7
    // Disable rules for known exceptions
8
    "color-contrast": { enabled: true },
9
    // Enable best-practices beyond WCAG
10
    region: { enabled: true },
11
  },
12
}

Tool Deep Dive: axe-core Ecosystem

axe-core (v4.11.x as of January 2026) provides 70+ accessibility rules and powers most modern testing integrations. Understanding its architecture helps configure it effectively.

Integration Options

Playwright (@axe-core/playwright) offers chainable configuration:


3 collapsed lines
1
import { test, expect } from "@playwright/test"
2
import AxeBuilder from "@axe-core/playwright"
3

4
test("checkout flow accessibility", async ({ page }) => {
5
  await page.goto("/checkout")
6

7
  const results = await new AxeBuilder({ page })
8
    .withTags(["wcag2a", "wcag2aa", "wcag22aa"])
9
    .exclude(".third-party-widget") // Known inaccessible embed
10
    .analyze()
11

12
  // Fail on violations, log incomplete for review
13
  expect(results.violations).toEqual([])
14
  if (results.incomplete.length > 0) {
15
    console.log("Manual review needed:", results.incomplete)
16
  }
17
})

Cypress (cypress-axe or Cypress Accessibility) provides cy.checkA11y():


6 collapsed lines
1
describe("Form Accessibility", () => {
2
  beforeEach(() => {
3
    cy.visit("/contact")
4
    cy.injectAxe()
5
  })
6

7
  it("form meets WCAG 2.2 AA", () => {
8
    cy.checkA11y(null, {
9
      runOnly: {
10
        type: "tag",
11
        values: ["wcag22aa"],
12
      },
13
    })
14
  })
15

16
  it("error states remain accessible", () => {
17
    cy.get("#email").type("invalid")
18
    cy.get("form").submit()
19
    cy.checkA11y() // Re-check after state change
20
  })
21
})

React (@axe-core/react) logs violations during development:


4 collapsed lines
1
import React from "react"
2
import ReactDOM from "react-dom/client"
3
import App from "./App"
4

5
if (process.env.NODE_ENV !== "production") {
6
  import("@axe-core/react").then((axe) => {
7
    axe.default(React, ReactDOM, 1000) // 1s debounce
8
  })
9
}
10

11
ReactDOM.createRoot(document.getElementById("root")).render(<App />)

axe-core Results Structure

axe-core returns four result categories—understanding them prevents ignoring useful signals:

Category	Meaning	CI/CD Action
violations	Definite failures	Fail build
passes	Definite passes	No action
incomplete	Potential issues needing human review	Log for manual triage
inapplicable	Rules that don’t apply to page content	No action

Common mistake: Ignoring incomplete results. These often flag issues like “review this image’s alt text”—not automatable but important for manual testing queues.

Tool Deep Dive: Pa11y and HTML CodeSniffer

Pa11y (v9.0.0, 2025) provides an alternative rule engine and excels at URL batch scanning. It uses HTML CodeSniffer (HTMLCS) by default but can run axe-core, or both simultaneously.

Architectural Differences from axe-core

Aspect	Pa11y (HTMLCS)	axe-core
Result model	Violations only	Violations + incomplete + passes
Philosophy	Definite issues	Definite + potential issues
Rule count	~70 checks	70+ rules
False positives	Moderate	Very low (by design)

Why use both: HTMLCS and axe-core have different rule implementations. Running both (runners: ['axe', 'htmlcs']) catches ~35% of WCAG issues—more than either alone—because their rule sets partially overlap but cover different edge cases.

Pa11y-CI for Pipeline Integration

Pa11y-CI (v4.0.0) is purpose-built for CI/CD. It fails pipelines on violations (unlike informational tools):

1
{
2
  "defaults": {
3
    "runners": ["axe", "htmlcs"],
4
    "standard": "WCAG2AA",
5
    "timeout": 30000,
6
    "wait": 1000
7
  },
8
  "urls": ["http://localhost:3000/", "http://localhost:3000/contact", "http://localhost:3000/checkout"]
9
}


15 collapsed lines
1
name: Accessibility
2
on: [push, pull_request]
3

4
jobs:
5
  a11y:
6
    runs-on: ubuntu-latest
7
    steps:
8
      - uses: actions/checkout@v4
9
      - uses: actions/setup-node@v4
10
        with:
11
          node-version: "22"
12
      - run: npm ci
13
      - run: npm run build
14
      - run: npm run preview &
15
      - run: npx wait-on http://localhost:4321
16

17
      - name: Pa11y CI
18
        run: npx pa11y-ci
19

20
      - name: Upload results
21
        if: failure()
22
        uses: actions/upload-artifact@v4
23
        with:
24
          name: pa11y-report
25
          path: pa11y-ci-results.json

Edge case: Pa11y requires the page to be fully rendered. Use wait to delay testing after JavaScript execution, or actions to interact with the page before scanning:

1
{
2
  "urls": [
3
    {
4
      "url": "http://localhost:3000/login",
5
      "actions": [
6
        "set field #email to test@example.com",
7
        "set field #password to password123",
8
        "click element #submit",
9
        "wait for element #dashboard to be visible"
10
      ]
11
    }
12
  ]
13
}

eslint-plugin-jsx-a11y performs static AST analysis of JSX—catching issues before runtime with zero performance impact. It’s the first line of defense in a shift-left strategy.

Rule Categories

The plugin provides ~30 rules across categories:

Alternative text: alt-text, img-redundant-alt ARIA validity: aria-props, aria-proptypes, aria-role, aria-unsupported-elements Semantic HTML: anchor-has-content, anchor-is-valid, heading-has-content Interaction: click-events-have-key-events, no-static-element-interactions Labels: label-has-associated-control

Configuration


5 collapsed lines
1
{
2
  "extends": ["eslint:recommended", "plugin:react/recommended"],
3
  "plugins": ["jsx-a11y"],
4
  "extends": ["plugin:jsx-a11y/recommended"],
5
  "rules": {
6
    // Override specific rules
7
    "jsx-a11y/anchor-is-valid": [
8
      "error",
9
      {
10
        "components": ["Link"],
11
        "specialLink": ["to"]
12
      }
13
    ],
14
    // Allow onClick on divs with role="button"
6 collapsed lines
15
    "jsx-a11y/no-static-element-interactions": [
16
      "error",
17
      {
18
        "allowExpressionValues": true,
19
        "handlers": ["onClick"]
20
      }
21
    ]
22
  }
23
}

Limitations

Static analysis cannot detect:

Runtime accessibility (focus management, live regions)
Dynamic content quality (generated alt text accuracy)
Component composition issues (label associations across components)
Third-party component accessibility

Design rationale: eslint-plugin-jsx-a11y catches the “low-hanging fruit” that developers often miss during coding. It doesn’t replace runtime testing—it prevents obvious issues from reaching that stage.

Browser Extensions: WAVE and axe DevTools

Browser extensions serve manual testing workflows—they’re interactive tools for developers, not CI/CD automation.

WAVE

Strengths:

Runs entirely client-side (safe for intranets, authenticated pages)
Visual overlay shows issues in page context
Explains issues for non-specialists

Limitations:

Manual page-by-page operation
Higher false positive rate than axe-core
Cannot verify content quality (alt text accuracy)
No CI/CD integration

Use case: Developer education and stakeholder communication. The visual overlay helps explain accessibility concepts to designers and PMs.

axe DevTools

The browser extension version of axe-core provides:

Same rule engine as automated tests
Interactive issue exploration
Guided remediation suggestions
Export for tracking

Use case: Debugging specific issues found in automated tests. The extension’s “Intelligent Guided Tests” walk through semi-automated checks for issues axe-core marks as “incomplete.”

Lighthouse Accessibility Audits

Lighthouse runs a subset of axe-core rules (~25-30 of 70+). It’s designed for quick health checks, not compliance verification.

When Lighthouse Falls Short

Lighthouse Reports	axe-core Catches
Basic contrast issues	All contrast permutations
Missing form labels	Label association edge cases
Alt text presence	Alt text in SVGs, custom components
Basic ARIA	Complex ARIA widget patterns

Practical guidance: Run Lighthouse for quick feedback during development. Run axe-core in your test suite for compliance. Don’t rely on a passing Lighthouse score for WCAG conformance.

Manual Testing: The Non-Negotiable 43%

Approximately 42% of WCAG criteria cannot be automated because they require subjective judgment. These criteria determine whether content works for users with disabilities, not just whether technical requirements are met.

No reliable automation exists for keyboard navigation quality. The test protocol:

Navigation keys:

Tab: Forward through interactive elements
Shift+Tab: Backward navigation
Enter/Space: Activate buttons and links
Arrow keys: Navigate within widgets (menus, tabs, autocomplete)
Escape: Close modals and dropdowns

What to verify:

All functionality accessible: Every action achievable with mouse must work with keyboard
Logical focus order: Tab sequence follows visual layout (top-to-bottom, left-to-right in LTR languages)
Visible focus indicators: 2px minimum outline with offset (WCAG 2.4.7)
No keyboard traps: User can always Tab away from any element
Focus management in modals: Focus trapped inside, returned on close

Common failure: SPA route changes don’t move focus. Users Tab through the old page’s elements until reaching new content.

1
// After route change, move focus to main content
2
function handleRouteChange() {
3
  const main = document.querySelector("main")
4
  main.setAttribute("tabindex", "-1")
5
  main.focus()
6
  // Remove tabindex after focus to prevent mouse focus outline
7
  main.addEventListener("blur", () => main.removeAttribute("tabindex"), { once: true })
8
}

Screen readers reveal issues invisible to sighted testing: missing labels, illogical heading structure, inadequate live region announcements.

Testing matrix (minimum coverage):

Platform	Screen Reader	Usage Share	Priority
Windows	NVDA	~40%	Required
Windows	JAWS	~30%	Enterprise contexts
macOS/iOS	VoiceOver	~15%	Apple users
Android	TalkBack	~10%	Mobile users

NVDA vs JAWS behavioral differences:

NVDA strictly follows DOM/accessibility tree—exposes missing labels, broken associations
JAWS uses heuristics to infer missing information—masks some issues but improves real-world usability

Test with both when possible. NVDA catches structural problems; JAWS reveals whether heuristics compensate for your issues (they shouldn’t be necessary).

Testing protocol:

Navigate entire page in reading mode (not just interactive elements)
Complete primary user flows (forms, checkout, search)
Verify dynamic content announces (live regions, error states)
Test error recovery (can user understand and fix input errors?)

Content Quality Assessment

These criteria require human judgment—no automation possible:

Alternative text (1.1.1): Does alt text convey the image’s purpose in context? “Graph showing sales data” is technically present but useless. “Sales increased 25% from Q1 to Q3” conveys meaning.

Meaningful sequence (1.3.2): Does reading order make sense when CSS positioning is ignored? Screen readers follow DOM order, not visual order.

Link purpose (2.4.4): Can users understand link destinations? “Click here” provides no context; “Download annual report (PDF, 2.4MB)” does.

Error suggestions (3.3.3): Do error messages explain how to fix the problem? “Invalid input” fails; “Email must include @ symbol” succeeds.

Bug Triage and Prioritization

Not all accessibility issues have equal impact. Prioritize by user impact and legal risk:

Severity Framework

Severity	Definition	Examples	Response
Critical	Complete barrier—task cannot be completed	No keyboard access to submit button, missing form labels, keyboard trap	Fix immediately
Serious	Major difficulty—task very hard to complete	Poor contrast, confusing focus order, missing error identification	Fix this sprint
Moderate	Inconvenience—task harder than necessary	Redundant alt text, minor contrast issues, verbose labels	Fix this quarter
Minor	Best practice—not a barrier	Missing landmark roles, suboptimal heading levels	Backlog

WCAG Level Mapping

WCAG Level	User Impact	Legal Risk
Level A failures	Complete barriers—AT cannot function	High—baseline requirement
Level AA failures	Significant barriers—tasks very difficult	High—legal compliance target
Level AAA failures	Maximum accessibility—specialized contexts	Low—not universally required

Issue Documentation Template

Track issues with sufficient context for developers:

1
## Issue: Missing label on email input
2

3
**Severity**: Critical
4
**WCAG**: 1.3.1 Info and Relationships (Level A)
5
**Page**: /checkout
6
**Tool**: axe-core (violations[0])
7

8
### Description
9

10
Email input field has no programmatic label. Screen reader users cannot identify the field's purpose.
11

12
### Current markup
13

14
<input type="email" name="email" placeholder="Email">
15

16
### Recommended fix
17

18
<label for="checkout-email">Email address</label>
19
<input type="email" id="checkout-email" name="email">
20

21
### Verification
22

23
- [ ] axe-core passes
24
- [ ] NVDA announces "Email address, edit"
25
- [ ] Label visible and associated

CI/CD Pipeline Architecture

Build a multi-stage pipeline that catches issues at appropriate development phases:

Stage 1: Pre-commit (Development Time)

1
{
2
  "lint-staged": {
3
    "*.{js,jsx,ts,tsx}": ["eslint --fix"]
4
  },
5
  "husky": {
6
    "hooks": {
7
      "pre-commit": "lint-staged"
8
    }
9
  }
10
}

eslint-plugin-jsx-a11y catches static violations before code enters the repository.

Stage 2: Pull Request (Automated Testing)


12 collapsed lines
1
name: PR Checks
2
on: pull_request
3

4
jobs:
5
  test:
6
    runs-on: ubuntu-latest
7
    steps:
8
      - uses: actions/checkout@v4
9
      - uses: actions/setup-node@v4
10
        with:
11
          node-version: "22"
12
      - run: npm ci
13

14
      - name: Lint (includes a11y rules)
15
        run: npm run lint
16

17
      - name: Unit + Component Tests (includes axe)
18
        run: npm test
19

20
      - name: E2E Tests (Playwright + axe-core)
21
        run: npx playwright test

Stage 3: Pre-deploy (Full Audit)


18 collapsed lines
1
name: Deploy
2
on:
3
  push:
4
    branches: [main]
5

6
jobs:
7
  audit:
8
    runs-on: ubuntu-latest
9
    steps:
10
      - uses: actions/checkout@v4
11
      - uses: actions/setup-node@v4
12
        with:
13
          node-version: "22"
14
      - run: npm ci
15
      - run: npm run build
16
      - run: npm run preview &
17
      - run: npx wait-on http://localhost:4321
18

19
      - name: Pa11y-CI (dual runner)
20
        run: npx pa11y-ci
21

22
      - name: Lighthouse CI
23
        run: npx lhci autorun
24

25
  deploy:
26
    needs: audit
27
    runs-on: ubuntu-latest
28
    steps:
29
      - name: Deploy to production
30
        run: ./deploy.sh

Quality Gates

Configure thresholds that prevent regressions:

1
module.exports = {
2
  ci: {
3
    assert: {
4
      assertions: {
5
        "categories:accessibility": ["error", { minScore: 0.9 }],
6
        // Specific audits
7
        "color-contrast": "error",
8
        "document-title": "error",
9
        "html-has-lang": "error",
10
        "meta-viewport": "error",
11
      },
12
    },
13
  },
14
}

Pa11y threshold:

1
{
2
  "defaults": {
3
    "threshold": 0
4
  }
5
}

A threshold of 0 fails on any violation. For legacy codebases, start with the current violation count and reduce over time:

1
{
2
  "defaults": {
3
    "threshold": 15
4
  }
5
}

Conclusion

Accessibility testing requires defense in depth: static analysis catches syntax errors, runtime automation catches technical violations, and manual testing catches experiential issues. No single tool provides complete coverage because ~42% of WCAG criteria require human judgment.

Build your pipeline around this reality:

Shift-left with eslint-plugin-jsx-a11y—catch obvious issues at development time
Gate PRs with axe-core in Playwright/Cypress—prevent regressions from merging
Audit with Pa11y-CI pre-deploy—dual-runner coverage catches more edge cases
Manual test before releases—keyboard and screen reader testing are non-negotiable

The 57% automated coverage is a floor, not a ceiling. With disciplined manual testing, you can catch 80-90% of issues before users encounter them. The remaining 10-20% requires user testing with people who actually use assistive technology—but that’s a topic for another article.

Appendix

Prerequisites

Familiarity with WCAG 2.2 success criteria structure (see WCAG 2.2 Practical Guide)
Experience with JavaScript testing frameworks (Jest, Playwright, or Cypress)
Basic understanding of assistive technology categories (screen readers, switch devices, voice control)
CI/CD pipeline concepts (GitHub Actions, GitLab CI, or similar)

Terminology

axe-core: Open-source accessibility testing engine by Deque, used by most modern testing integrations
HTML CodeSniffer (HTMLCS): Alternative accessibility rule engine used by Pa11y by default
AST (Abstract Syntax Tree): Code representation that eslint-plugin-jsx-a11y analyzes for static violations
incomplete results: axe-core’s category for issues requiring human review—not definite violations but potential problems
shift-left: Moving quality checks earlier in the development process (from deployment to development time)
quality gate: Automated check that prevents code progression (merge, deploy) if criteria aren’t met

Summary

Automated testing catches ~57% of accessibility issues—measured by issue volume, not criteria count
~42% of WCAG criteria require human judgment for content quality, focus order logic, and user experience
axe-core dominates because its conservative rule engineering minimizes false positives, making it safe for CI/CD gates
Pa11y with dual runners (axe + HTMLCS) catches more edge cases than either engine alone
eslint-plugin-jsx-a11y provides shift-left coverage for JSX codebases—zero runtime cost
Manual testing is non-negotiable: keyboard navigation and screen reader testing cannot be automated effectively
Screen reader testing should use NVDA (catches structural issues) and JAWS (reveals heuristic compensation) when possible
Issue triage should prioritize by user impact (complete barrier vs. inconvenience) and WCAG level (A/AA violations first)

References

Specifications

WCAG 2.2 W3C Recommendation - Normative success criteria
Understanding WCAG 2.2 - Techniques and intent for each criterion
W3C ACT Rules - Accessibility Conformance Testing rule format

Official Documentation

axe-core GitHub - Source code and API documentation
axe-core Rule Descriptions - Detailed explanation of each rule
Pa11y Documentation - CLI and CI tool usage
Pa11y-CI GitHub - CI integration configuration
eslint-plugin-jsx-a11y GitHub - Static analysis rules
Playwright Accessibility Testing - @axe-core/playwright integration
Cypress Accessibility - cypress-axe and Cypress Accessibility

Research and Analysis

Deque: Automated Testing Identifies 57% of Issues - Methodology and findings
Accessible.org: What Percentage of WCAG Can Be Automated? - Criteria-level analysis
WebAIM Million - Annual analysis of homepage accessibility

Tools

axe DevTools Browser Extension - Interactive testing
WAVE Evaluation Tool - Browser extension for manual testing
NVDA Screen Reader - Free Windows screen reader
JAWS Screen Reader - Commercial Windows screen reader