Application Health Checklist
Reviewed quarterly by the component owner.
Priority
- Must be fixed immediately, have the on-call person do it
- Must be fixed the next sprint, create a story and put it at the top of the backlog
- Must be fixed this quarter, component owner accountable for making sure it gets done
- Nice to have, track non-compliance but don't schedule the work
Application Design
Line Item |
Meets? |
Priority |
Notes |
Properly Focused Microservice |
|
|
|
External dependencies minimized |
|
|
|
External dependencies have reasonable timeouts |
|
|
|
Uses event mesh to emit events |
|
|
|
Cloud located |
|
|
|
Uses error code/message standards |
|
|
|
Instrumented to use feature flagging |
|
|
|
Health
Line Item |
Meets? |
Priority |
Notes |
Warning levels set to 3 |
|
|
|
No build warnings |
|
|
|
Functional tests run locally |
|
|
|
Functional tests run in integration |
|
|
|
Bug Bash run within the last year |
|
|
|
Resiliency exercise run within the past year |
|
|
|
Tech debt backlog reviewed and reprioritized |
|
|
|
No packages are deprecated within next quarter |
|
|
|
All NuGet packages are up to date on minor versions |
|
|
|
Evaluate major version NuGet updates for breaking changes |
|
|
|
Documentation
Line Item |
Meets? |
Priority |
Notes |
Technical documentation up to date |
|
|
|
Physical documentation up to date |
|
|
|
Open API documentation up to date |
|
|
|
Integrating clients documented |
|
|
|
Key schedule/rotation documented |
|
|
|
Credential schedule/rotation documented |
|
|
|
End-of-life schedule reviewed and documented |
|
|
|
Service proxies documented |
|
|
|
REST APIs have useful swagger pages |
|
|
|
GraphQL APIs have useful GraphiQL pages |
|
|
|
Technical SLA/SLI/SLOs documented |
|
|
|
Business KPIs documented |
|
|
|
Load testing scenarios documented |
|
|
|
Support Runbook filed and up to date with SREs |
|
|
|
Domain data elements/objects documented |
|
|
|
Error codes documented |
|
|
|
Associated workflow diagrams up to date |
|
|
|
Pre/post deployment checklists up to date |
|
|
|
Observability
Line Item |
Meets? |
Priority |
Notes |
App logs in DataDog |
|
|
|
IIS logging in DataDog |
|
|
|
APM in DataDog |
|
|
|
Network monitoring in Thousand Eyes |
|
|
|
Webmon/ping monitoring |
|
|
|
Dashboard for SLA/SLO performance created and documented |
|
|
|
Dashboard for business KPIs created and documented |
|
|
|
End-to-end scenarios dashboarded in DataDog |
|
|
|
All requests logged on entry/exit |
|
|
|
App log volume minimized |
|
|
|
Logging to information level and above in Stage and Prod |
|
|
|
Logging to Debug level below Stage |
|
|
|
Security logs go to the SIEM |
|
|
|
Service has ping endpoint |
|
|
|
Service has health check endpoint |
|
|
|
Ping endpoint doesn't require x-correlation-id |
|
|
|
Level 1 alerts created and documented |
|
|
|
Level 2 alerts created and documented |
|
|
|
Testing
Line Item |
Meets? |
Priority |
Notes |
Unit tests cover > 80% code base |
|
|
|
Unit tests run automatically on every build |
|
|
|
Functional tests run automatically after deployment |
|
|
|
Load tests can run with mocked dependencies |
|
|
|
Load tests can run with integrated dependencies |
|
|
|
Security
Line Item |
Meets? |
Priority |
Notes |
Application reviewed by cyber security within the past year |
|
|
|
No credentials in GitHub |
|
|
|
No plain text credentials in CI/CD |
|
|
|
No level 1 or PCI data logged |
|
|
|
GitHub permissions are appropriate |
|
|
|
App sec scans run on build |
|
|
|
Application uses appropriate authx |
|
|
|
Access lists are not overly permissive |
|
|
|
No logged high or critical security findings |
|
|
|
No logged medium or low security findings |
|
|
|
Pen test run within the past year |
|
|
|