CEPH deep-scrubbing is sometimes brutal — PGs eventually go beyond the max deep scrub interval, and everyone happily go into deepscrub at the exact same time. This most probably happens at the worst time (hello there Murphy’s), bringing your available IO to its knees when you need it the most (which is practically “anytime”, right ?).
To avoid this, you can plan your deep-scrubing at optimal times regarding your business rules. The idea here is to set your OSD’s osd_deep_scrub_interval values high enough so it never really kicks in, and have your PGs deep-scrubbed when time seems right on a shorter period; for instance:
- osd deep scrub interval = 2419200 #That’s 4 weeks
- Deep-scrub everything with a 1 week period.
To further spread the load, deep-scrub a given share of your PGs at a time, with no more than a given amount of deep-scrubbing tasks in parallel; i.e deep-scrub the “oldest” 1/7th of your PGs once a day at never more than 2 parallel scrubs at a time.
The script below does just that.
It does require python to parse ceph JSON output. Adapt the “MAXSCRUBS” variable for parallelism, call it with an argument, defining the share of all active PG to be scrubbed in one run, or just use the default value of 7, call it from a cron and you’re good to go.
Don’t forget to increase your osd_deep_scrub_interval so Ceph by itself does not mess with your careful planing.
CEPH: planned deep-scrubbing script