Application Health Checklist
Reviewed quarterly by the component owner.
Priority
- Must be fixed immediately, have the on-call person do it
- Must be fixed the next sprint, create a story and put it at the top of the backlog
- Must be fixed this quarter, component owner accountable for making sure it gets done
- Nice to have, track non-compliance but don't schedule the work
Application Design
| Line Item |
Meets? |
Priority |
Notes |
| Properly Focused Microservice |
|
|
|
| External dependencies minimized |
|
|
|
| External dependencies have reasonable timeouts |
|
|
|
| Uses event mesh to emit events |
|
|
|
| Cloud located |
|
|
|
| Uses error code/message standards |
|
|
|
| Instrumented to use feature flagging |
|
|
|
Health
| Line Item |
Meets? |
Priority |
Notes |
| Warning levels set to 3 |
|
|
|
| No build warnings |
|
|
|
| Functional tests run locally |
|
|
|
| Functional tests run in integration |
|
|
|
| Bug Bash run within the last year |
|
|
|
| Resiliency exercise run within the past year |
|
|
|
| Tech debt backlog reviewed and reprioritized |
|
|
|
| No packages are deprecated within next quarter |
|
|
|
| All NuGet packages are up to date on minor versions |
|
|
|
| Evaluate major version NuGet updates for breaking changes |
|
|
|
Documentation
| Line Item |
Meets? |
Priority |
Notes |
| Technical documentation up to date |
|
|
|
| Physical documentation up to date |
|
|
|
| Open API documentation up to date |
|
|
|
| Integrating clients documented |
|
|
|
| Key schedule/rotation documented |
|
|
|
| Credential schedule/rotation documented |
|
|
|
| End-of-life schedule reviewed and documented |
|
|
|
| Service proxies documented |
|
|
|
| REST APIs have useful swagger pages |
|
|
|
| GraphQL APIs have useful GraphiQL pages |
|
|
|
| Technical SLA/SLI/SLOs documented |
|
|
|
| Business KPIs documented |
|
|
|
| Load testing scenarios documented |
|
|
|
| Support Runbook filed and up to date with SREs |
|
|
|
| Domain data elements/objects documented |
|
|
|
| Error codes documented |
|
|
|
| Associated workflow diagrams up to date |
|
|
|
| Pre/post deployment checklists up to date |
|
|
|
Observability
| Line Item |
Meets? |
Priority |
Notes |
| App logs in DataDog |
|
|
|
| IIS logging in DataDog |
|
|
|
| APM in DataDog |
|
|
|
| Network monitoring in Thousand Eyes |
|
|
|
| Webmon/ping monitoring |
|
|
|
| Dashboard for SLA/SLO performance created and documented |
|
|
|
| Dashboard for business KPIs created and documented |
|
|
|
| End-to-end scenarios dashboarded in DataDog |
|
|
|
| All requests logged on entry/exit |
|
|
|
| App log volume minimized |
|
|
|
| Logging to information level and above in Stage and Prod |
|
|
|
| Logging to Debug level below Stage |
|
|
|
| Security logs go to the SIEM |
|
|
|
| Service has ping endpoint |
|
|
|
| Service has health check endpoint |
|
|
|
| Ping endpoint doesn't require x-correlation-id |
|
|
|
| Level 1 alerts created and documented |
|
|
|
| Level 2 alerts created and documented |
|
|
|
Testing
| Line Item |
Meets? |
Priority |
Notes |
| Unit tests cover > 80% code base |
|
|
|
| Unit tests run automatically on every build |
|
|
|
| Functional tests run automatically after deployment |
|
|
|
| Load tests can run with mocked dependencies |
|
|
|
| Load tests can run with integrated dependencies |
|
|
|
Security
| Line Item |
Meets? |
Priority |
Notes |
| Application reviewed by cyber security within the past year |
|
|
|
| No credentials in GitHub |
|
|
|
| No plain text credentials in CI/CD |
|
|
|
| No level 1 or PCI data logged |
|
|
|
| GitHub permissions are appropriate |
|
|
|
| App sec scans run on build |
|
|
|
| Application uses appropriate authx |
|
|
|
| Access lists are not overly permissive |
|
|
|
| No logged high or critical security findings |
|
|
|
| No logged medium or low security findings |
|
|
|
| Pen test run within the past year |
|
|
|