MunkyC Musings

Application Health Checklist

Reviewed quarterly by the component owner.

Priority

  1. Must be fixed immediately, have the on-call person do it
  2. Must be fixed the next sprint, create a story and put it at the top of the backlog
  3. Must be fixed this quarter, component owner accountable for making sure it gets done
  4. Nice to have, track non-compliance but don't schedule the work

Application Design

Line Item Meets? Priority Notes
Properly Focused Microservice
External dependencies minimized
External dependencies have reasonable timeouts
Uses event mesh to emit events
Cloud located
Uses error code/message standards
Instrumented to use feature flagging

Health

Line Item Meets? Priority Notes
Warning levels set to 3
No build warnings
Functional tests run locally
Functional tests run in integration
Bug Bash run within the last year
Resiliency exercise run within the past year
Tech debt backlog reviewed and reprioritized
No packages are deprecated within next quarter
All NuGet packages are up to date on minor versions
Evaluate major version NuGet updates for breaking changes

Documentation

Line Item Meets? Priority Notes
Technical documentation up to date
Physical documentation up to date
Open API documentation up to date
Integrating clients documented
Key schedule/rotation documented
Credential schedule/rotation documented
End-of-life schedule reviewed and documented
Service proxies documented
REST APIs have useful swagger pages
GraphQL APIs have useful GraphiQL pages
Technical SLA/SLI/SLOs documented
Business KPIs documented
Load testing scenarios documented
Support Runbook filed and up to date with SREs
Domain data elements/objects documented
Error codes documented
Associated workflow diagrams up to date
Pre/post deployment checklists up to date

Observability

Line Item Meets? Priority Notes
App logs in DataDog
IIS logging in DataDog
APM in DataDog
Network monitoring in Thousand Eyes
Webmon/ping monitoring
Dashboard for SLA/SLO performance created and documented
Dashboard for business KPIs created and documented
End-to-end scenarios dashboarded in DataDog
All requests logged on entry/exit
App log volume minimized
Logging to information level and above in Stage and Prod
Logging to Debug level below Stage
Security logs go to the SIEM
Service has ping endpoint
Service has health check endpoint
Ping endpoint doesn't require x-correlation-id
Level 1 alerts created and documented
Level 2 alerts created and documented

Testing

Line Item Meets? Priority Notes
Unit tests cover > 80% code base
Unit tests run automatically on every build
Functional tests run automatically after deployment
Load tests can run with mocked dependencies
Load tests can run with integrated dependencies

Security

Line Item Meets? Priority Notes
Application reviewed by cyber security within the past year
No credentials in GitHub
No plain text credentials in CI/CD
No level 1 or PCI data logged
GitHub permissions are appropriate
App sec scans run on build
Application uses appropriate authx
Access lists are not overly permissive
No logged high or critical security findings
No logged medium or low security findings
Pen test run within the past year