Accessibility Standards
14 min read

Accessibility Testing and Tooling Workflow

A practical workflow for automated and manual accessibility testing, covering tool selection, CI/CD integration, and testing strategies. Automated testing catches approximately 57% of accessibility issues (Deque, 2021)—the remaining 43% requires keyboard navigation testing, screen reader verification, and subjective judgment about content quality. This guide covers how to build a testing strategy that maximizes automated coverage while establishing the manual testing practices that no tool can replace.

Coverage

Manual Testing

Automated Testing

Development Time

eslint-plugin-jsx-a11y

Static AST analysis

IDE Extensions

Real-time feedback

Component Tests

@axe-core/react

E2E Tests

Playwright + axe-core

CI Pipeline

Pa11y-CI / Cypress

Keyboard Navigation

Tab order, focus traps

Screen Reader

NVDA, JAWS, VoiceOver

Browser Extensions

WAVE, axe DevTools

~57% of issues

Automated

~43% of issues

Manual + User Testing

Accessibility testing workflow: automated tools catch structural issues early; manual testing catches semantic and experiential issues that require human judgment

Accessibility testing requires a layered approach because different issue categories require different detection methods:

Testing LayerWhat It CatchesCoverage
Static analysis (eslint-plugin-jsx-a11y)Missing alt attributes, invalid ARIA, semantic violations in JSX~15% of criteria, development-time
Runtime automation (axe-core, Pa11y)Contrast ratios, duplicate IDs, missing labels, ARIA state validity~35% of WCAG criteria reliably
Manual testing (keyboard, screen readers)Focus order logic, content meaning, navigation consistency~42% of criteria—non-automatable

Why automation alone fails: WCAG 2.2’s 86 success criteria include subjective requirements—whether alt text accurately describes an image, whether error messages provide helpful guidance, whether focus order is logically intuitive. Tools can detect presence of alt text but cannot evaluate its correctness.

Tool selection principle: axe-core dominates because of its conservative rule engineering (minimizes false positives), making it safe for CI/CD gates. Pa11y adds HTML CodeSniffer’s distinct rule set for broader coverage. WAVE and Lighthouse serve quick audits but lack the rigor for compliance verification.

Testing workflow design:

  1. Shift-left: eslint-plugin-jsx-a11y catches issues in IDE before code commits
  2. Component/E2E: axe-core integration in Playwright/Cypress catches runtime issues
  3. CI gates: Pa11y-CI fails builds on critical violations
  4. Manual protocol: Keyboard + screen reader testing before each release

The 57% figure from Deque’s study (13,000+ pages, 300,000+ issues) measures issue volume, not criteria count. Some issue types (missing labels, contrast failures) occur frequently—automation catches these reliably. Other criteria (meaningful sequence, focus order) rarely produce automatable signals.

ReliabilityCriteria CountExamplesDetection Confidence
High~13%Color contrast ratios, duplicate IDs, missing form labelsMeasurable technical requirements; minimal false positives
Partial~45%Heading hierarchy, link purpose, error identificationDetect presence but not quality/correctness
None~42%Alt text accuracy, focus order logic, caption timingRequire human judgment

What “high reliability” means: Contrast ratio calculations are objective—4.5:1 for normal text, 3:1 for large text per WCAG 1.4.3. Tools calculate this deterministically. “High reliability” criteria have clear pass/fail thresholds without subjective interpretation.

What “partial” means: A tool can verify a heading exists after content but cannot determine if the heading accurately describes that content. It detects structural presence, not semantic correctness.

What “none” means: “Focus order preserves meaning and operability” (WCAG 2.4.3) requires understanding user intent and page purpose. No algorithm can determine if tab order is “logical” without understanding the content’s meaning.

axe-core’s design philosophy prioritizes zero false positives over maximum coverage. From the axe-core documentation:

“Axe-core is designed to report only issues we’re confident are accessibility issues. We’d rather miss an issue than report a false positive.”

This makes axe-core safe for CI/CD gates—builds won’t fail for phantom issues. The trade-off: axe-core’s “incomplete” results (issues needing human review) are often ignored in CI pipelines, missing partial-detection opportunities.

Configuration for WCAG 2.2 AA compliance:

axe-config.js
const axeConfig = {
runOnly: {
type: "tag",
values: ["wcag2a", "wcag2aa", "wcag21a", "wcag21aa", "wcag22aa"],
},
rules: {
// Disable rules for known exceptions
"color-contrast": { enabled: true },
// Enable best-practices beyond WCAG
region: { enabled: true },
},
}

axe-core (v4.11.x as of January 2026) provides 70+ accessibility rules and powers most modern testing integrations. Understanding its architecture helps configure it effectively.

Playwright (@axe-core/playwright) offers chainable configuration:

playwright-a11y.spec.js
3 collapsed lines
import { test, expect } from "@playwright/test"
import AxeBuilder from "@axe-core/playwright"
test("checkout flow accessibility", async ({ page }) => {
await page.goto("/checkout")
const results = await new AxeBuilder({ page })
.withTags(["wcag2a", "wcag2aa", "wcag22aa"])
.exclude(".third-party-widget") // Known inaccessible embed
.analyze()
// Fail on violations, log incomplete for review
expect(results.violations).toEqual([])
if (results.incomplete.length > 0) {
console.log("Manual review needed:", results.incomplete)
}
})

Cypress (cypress-axe or Cypress Accessibility) provides cy.checkA11y():

cypress-a11y.spec.js
6 collapsed lines
describe("Form Accessibility", () => {
beforeEach(() => {
cy.visit("/contact")
cy.injectAxe()
})
it("form meets WCAG 2.2 AA", () => {
cy.checkA11y(null, {
runOnly: {
type: "tag",
values: ["wcag22aa"],
},
})
})
it("error states remain accessible", () => {
cy.get("#email").type("invalid")
cy.get("form").submit()
cy.checkA11y() // Re-check after state change
})
})

React (@axe-core/react) logs violations during development:

index.jsx
4 collapsed lines
import React from "react"
import ReactDOM from "react-dom/client"
import App from "./App"
if (process.env.NODE_ENV !== "production") {
import("@axe-core/react").then((axe) => {
axe.default(React, ReactDOM, 1000) // 1s debounce
})
}
ReactDOM.createRoot(document.getElementById("root")).render(<App />)

axe-core returns four result categories—understanding them prevents ignoring useful signals:

CategoryMeaningCI/CD Action
violationsDefinite failuresFail build
passesDefinite passesNo action
incompletePotential issues needing human reviewLog for manual triage
inapplicableRules that don’t apply to page contentNo action

Common mistake: Ignoring incomplete results. These often flag issues like “review this image’s alt text”—not automatable but important for manual testing queues.

Pa11y (v9.0.0, 2025) provides an alternative rule engine and excels at URL batch scanning. It uses HTML CodeSniffer (HTMLCS) by default but can run axe-core, or both simultaneously.

AspectPa11y (HTMLCS)axe-core
Result modelViolations onlyViolations + incomplete + passes
PhilosophyDefinite issuesDefinite + potential issues
Rule count~70 checks70+ rules
False positivesModerateVery low (by design)

Why use both: HTMLCS and axe-core have different rule implementations. Running both (runners: ['axe', 'htmlcs']) catches ~35% of WCAG issues—more than either alone—because their rule sets partially overlap but cover different edge cases.

Pa11y-CI (v4.0.0) is purpose-built for CI/CD. It fails pipelines on violations (unlike informational tools):

.pa11yci.json
{
"defaults": {
"runners": ["axe", "htmlcs"],
"standard": "WCAG2AA",
"timeout": 30000,
"wait": 1000
},
"urls": ["http://localhost:3000/", "http://localhost:3000/contact", "http://localhost:3000/checkout"]
}
.github/workflows/a11y.yml
15 collapsed lines
name: Accessibility
on: [push, pull_request]
jobs:
a11y:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: "22"
- run: npm ci
- run: npm run build
- run: npm run preview &
- run: npx wait-on http://localhost:4321
- name: Pa11y CI
run: npx pa11y-ci
- name: Upload results
if: failure()
uses: actions/upload-artifact@v4
with:
name: pa11y-report
path: pa11y-ci-results.json

Edge case: Pa11y requires the page to be fully rendered. Use wait to delay testing after JavaScript execution, or actions to interact with the page before scanning:

.pa11yci.json
{
"urls": [
{
"url": "http://localhost:3000/login",
"actions": [
"set field #email to test@example.com",
"set field #password to password123",
"click element #submit",
"wait for element #dashboard to be visible"
]
}
]
}

eslint-plugin-jsx-a11y performs static AST analysis of JSX—catching issues before runtime with zero performance impact. It’s the first line of defense in a shift-left strategy.

The plugin provides ~30 rules across categories:

Alternative text: alt-text, img-redundant-alt ARIA validity: aria-props, aria-proptypes, aria-role, aria-unsupported-elements Semantic HTML: anchor-has-content, anchor-is-valid, heading-has-content Interaction: click-events-have-key-events, no-static-element-interactions Labels: label-has-associated-control

.eslintrc.json
5 collapsed lines
{
"extends": ["eslint:recommended", "plugin:react/recommended"],
"plugins": ["jsx-a11y"],
"extends": ["plugin:jsx-a11y/recommended"],
"rules": {
// Override specific rules
"jsx-a11y/anchor-is-valid": [
"error",
{
"components": ["Link"],
"specialLink": ["to"]
}
],
// Allow onClick on divs with role="button"
6 collapsed lines
"jsx-a11y/no-static-element-interactions": [
"error",
{
"allowExpressionValues": true,
"handlers": ["onClick"]
}
]
}
}

Static analysis cannot detect:

  • Runtime accessibility (focus management, live regions)
  • Dynamic content quality (generated alt text accuracy)
  • Component composition issues (label associations across components)
  • Third-party component accessibility

Design rationale: eslint-plugin-jsx-a11y catches the “low-hanging fruit” that developers often miss during coding. It doesn’t replace runtime testing—it prevents obvious issues from reaching that stage.

Browser extensions serve manual testing workflows—they’re interactive tools for developers, not CI/CD automation.

Strengths:

  • Runs entirely client-side (safe for intranets, authenticated pages)
  • Visual overlay shows issues in page context
  • Explains issues for non-specialists

Limitations:

  • Manual page-by-page operation
  • Higher false positive rate than axe-core
  • Cannot verify content quality (alt text accuracy)
  • No CI/CD integration

Use case: Developer education and stakeholder communication. The visual overlay helps explain accessibility concepts to designers and PMs.

The browser extension version of axe-core provides:

  • Same rule engine as automated tests
  • Interactive issue exploration
  • Guided remediation suggestions
  • Export for tracking

Use case: Debugging specific issues found in automated tests. The extension’s “Intelligent Guided Tests” walk through semi-automated checks for issues axe-core marks as “incomplete.”

Lighthouse runs a subset of axe-core rules (~25-30 of 70+). It’s designed for quick health checks, not compliance verification.

Lighthouse Reportsaxe-core Catches
Basic contrast issuesAll contrast permutations
Missing form labelsLabel association edge cases
Alt text presenceAlt text in SVGs, custom components
Basic ARIAComplex ARIA widget patterns

Practical guidance: Run Lighthouse for quick feedback during development. Run axe-core in your test suite for compliance. Don’t rely on a passing Lighthouse score for WCAG conformance.

Approximately 42% of WCAG criteria cannot be automated because they require subjective judgment. These criteria determine whether content works for users with disabilities, not just whether technical requirements are met.

No reliable automation exists for keyboard navigation quality. The test protocol:

Navigation keys:

  • Tab: Forward through interactive elements
  • Shift+Tab: Backward navigation
  • Enter/Space: Activate buttons and links
  • Arrow keys: Navigate within widgets (menus, tabs, autocomplete)
  • Escape: Close modals and dropdowns

What to verify:

  1. All functionality accessible: Every action achievable with mouse must work with keyboard
  2. Logical focus order: Tab sequence follows visual layout (top-to-bottom, left-to-right in LTR languages)
  3. Visible focus indicators: 2px minimum outline with offset (WCAG 2.4.7)
  4. No keyboard traps: User can always Tab away from any element
  5. Focus management in modals: Focus trapped inside, returned on close

Common failure: SPA route changes don’t move focus. Users Tab through the old page’s elements until reaching new content.

spa-focus.js
// After route change, move focus to main content
function handleRouteChange() {
const main = document.querySelector("main")
main.setAttribute("tabindex", "-1")
main.focus()
// Remove tabindex after focus to prevent mouse focus outline
main.addEventListener("blur", () => main.removeAttribute("tabindex"), { once: true })
}

Screen readers reveal issues invisible to sighted testing: missing labels, illogical heading structure, inadequate live region announcements.

Testing matrix (minimum coverage):

PlatformScreen ReaderUsage SharePriority
WindowsNVDA~40%Required
WindowsJAWS~30%Enterprise contexts
macOS/iOSVoiceOver~15%Apple users
AndroidTalkBack~10%Mobile users

NVDA vs JAWS behavioral differences:

  • NVDA strictly follows DOM/accessibility tree—exposes missing labels, broken associations
  • JAWS uses heuristics to infer missing information—masks some issues but improves real-world usability

Test with both when possible. NVDA catches structural problems; JAWS reveals whether heuristics compensate for your issues (they shouldn’t be necessary).

Testing protocol:

  1. Navigate entire page in reading mode (not just interactive elements)
  2. Complete primary user flows (forms, checkout, search)
  3. Verify dynamic content announces (live regions, error states)
  4. Test error recovery (can user understand and fix input errors?)

These criteria require human judgment—no automation possible:

Alternative text (1.1.1): Does alt text convey the image’s purpose in context? “Graph showing sales data” is technically present but useless. “Sales increased 25% from Q1 to Q3” conveys meaning.

Meaningful sequence (1.3.2): Does reading order make sense when CSS positioning is ignored? Screen readers follow DOM order, not visual order.

Link purpose (2.4.4): Can users understand link destinations? “Click here” provides no context; “Download annual report (PDF, 2.4MB)” does.

Error suggestions (3.3.3): Do error messages explain how to fix the problem? “Invalid input” fails; “Email must include @ symbol” succeeds.

Not all accessibility issues have equal impact. Prioritize by user impact and legal risk:

SeverityDefinitionExamplesResponse
CriticalComplete barrier—task cannot be completedNo keyboard access to submit button, missing form labels, keyboard trapFix immediately
SeriousMajor difficulty—task very hard to completePoor contrast, confusing focus order, missing error identificationFix this sprint
ModerateInconvenience—task harder than necessaryRedundant alt text, minor contrast issues, verbose labelsFix this quarter
MinorBest practice—not a barrierMissing landmark roles, suboptimal heading levelsBacklog
WCAG LevelUser ImpactLegal Risk
Level A failuresComplete barriers—AT cannot functionHigh—baseline requirement
Level AA failuresSignificant barriers—tasks very difficultHigh—legal compliance target
Level AAA failuresMaximum accessibility—specialized contextsLow—not universally required

Track issues with sufficient context for developers:

## Issue: Missing label on email input
**Severity**: Critical
**WCAG**: 1.3.1 Info and Relationships (Level A)
**Page**: /checkout
**Tool**: axe-core (violations[0])
### Description
Email input field has no programmatic label. Screen reader users cannot identify the field's purpose.
### Current markup
<input type="email" name="email" placeholder="Email">
### Recommended fix
<label for="checkout-email">Email address</label>
<input type="email" id="checkout-email" name="email">
### Verification
- [ ] axe-core passes
- [ ] NVDA announces "Email address, edit"
- [ ] Label visible and associated

Build a multi-stage pipeline that catches issues at appropriate development phases:

package.json
{
"lint-staged": {
"*.{js,jsx,ts,tsx}": ["eslint --fix"]
},
"husky": {
"hooks": {
"pre-commit": "lint-staged"
}
}
}

eslint-plugin-jsx-a11y catches static violations before code enters the repository.

.github/workflows/pr.yml
12 collapsed lines
name: PR Checks
on: pull_request
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: "22"
- run: npm ci
- name: Lint (includes a11y rules)
run: npm run lint
- name: Unit + Component Tests (includes axe)
run: npm test
- name: E2E Tests (Playwright + axe-core)
run: npx playwright test
.github/workflows/deploy.yml
18 collapsed lines
name: Deploy
on:
push:
branches: [main]
jobs:
audit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: "22"
- run: npm ci
- run: npm run build
- run: npm run preview &
- run: npx wait-on http://localhost:4321
- name: Pa11y-CI (dual runner)
run: npx pa11y-ci
- name: Lighthouse CI
run: npx lhci autorun
deploy:
needs: audit
runs-on: ubuntu-latest
steps:
- name: Deploy to production
run: ./deploy.sh

Configure thresholds that prevent regressions:

lighthouserc.js
module.exports = {
ci: {
assert: {
assertions: {
"categories:accessibility": ["error", { minScore: 0.9 }],
// Specific audits
"color-contrast": "error",
"document-title": "error",
"html-has-lang": "error",
"meta-viewport": "error",
},
},
},
}

Pa11y threshold:

.pa11yci.json
{
"defaults": {
"threshold": 0
}
}

A threshold of 0 fails on any violation. For legacy codebases, start with the current violation count and reduce over time:

{
"defaults": {
"threshold": 15
}
}

Accessibility testing requires defense in depth: static analysis catches syntax errors, runtime automation catches technical violations, and manual testing catches experiential issues. No single tool provides complete coverage because ~42% of WCAG criteria require human judgment.

Build your pipeline around this reality:

  1. Shift-left with eslint-plugin-jsx-a11y—catch obvious issues at development time
  2. Gate PRs with axe-core in Playwright/Cypress—prevent regressions from merging
  3. Audit with Pa11y-CI pre-deploy—dual-runner coverage catches more edge cases
  4. Manual test before releases—keyboard and screen reader testing are non-negotiable

The 57% automated coverage is a floor, not a ceiling. With disciplined manual testing, you can catch 80-90% of issues before users encounter them. The remaining 10-20% requires user testing with people who actually use assistive technology—but that’s a topic for another article.

  • Familiarity with WCAG 2.2 success criteria structure (see WCAG 2.2 Practical Guide)
  • Experience with JavaScript testing frameworks (Jest, Playwright, or Cypress)
  • Basic understanding of assistive technology categories (screen readers, switch devices, voice control)
  • CI/CD pipeline concepts (GitHub Actions, GitLab CI, or similar)
  • axe-core: Open-source accessibility testing engine by Deque, used by most modern testing integrations
  • HTML CodeSniffer (HTMLCS): Alternative accessibility rule engine used by Pa11y by default
  • AST (Abstract Syntax Tree): Code representation that eslint-plugin-jsx-a11y analyzes for static violations
  • incomplete results: axe-core’s category for issues requiring human review—not definite violations but potential problems
  • shift-left: Moving quality checks earlier in the development process (from deployment to development time)
  • quality gate: Automated check that prevents code progression (merge, deploy) if criteria aren’t met
  • Automated testing catches ~57% of accessibility issues—measured by issue volume, not criteria count
  • ~42% of WCAG criteria require human judgment for content quality, focus order logic, and user experience
  • axe-core dominates because its conservative rule engineering minimizes false positives, making it safe for CI/CD gates
  • Pa11y with dual runners (axe + HTMLCS) catches more edge cases than either engine alone
  • eslint-plugin-jsx-a11y provides shift-left coverage for JSX codebases—zero runtime cost
  • Manual testing is non-negotiable: keyboard navigation and screen reader testing cannot be automated effectively
  • Screen reader testing should use NVDA (catches structural issues) and JAWS (reveals heuristic compensation) when possible
  • Issue triage should prioritize by user impact (complete barrier vs. inconvenience) and WCAG level (A/AA violations first)

Specifications

Official Documentation

Research and Analysis

Tools

Read more

  • Previous

    DOM API Essentials: Structure, Traversal, and Mutation

    Web Foundations / Browser APIs 22 min read

    A comprehensive exploration of DOM APIs, examining the interface hierarchy design decisions, selector return type differences, and the modern Observer pattern for efficient DOM monitoring. The DOM Standard (WHATWG Living Standard, last updated January 2026) defines a layered inheritance model where each interface adds specific capabilities while maintaining backward compatibility—understanding this design reveals why certain methods exist on Element rather than HTMLElement and why selector APIs return different collection types with distinct liveness semantics.

  • Next

    WCAG 2.2: Practical Accessibility Guide

    Web Foundations / Accessibility Standards 16 min read

    Web Content Accessibility Guidelines (WCAG) 2.2 became a W3C Recommendation in October 2023, adding 9 new success criteria focused on cognitive accessibility, mobile interaction, and focus visibility. This guide covers implementation strategies for semantic HTML, ARIA patterns, and testing methodologies—practical knowledge for building inclusive web experiences that meet legal requirements in the US (ADA) and EU (European Accessibility Act).