[ceph-users] Pool stats issue with upgrades to nautilus

Sage Weil sweil at redhat.com
Fri Jul 12 08:22:19 PDT 2019

Hi everyone,

All current Nautilus releases have an issue where deploying a single new 
(Nautilus) BlueStore OSD on an upgraded cluster (i.e. one that was 
originally deployed pre-Nautilus) breaks the pool utilization stats 
reported by ``ceph df``.  Until all OSDs have been reprovisioned or 
updated (via ``ceph-bluestore-tool repair``), the pool stats will show 
values that are lower than the true value.  A fix is in the works but will 
not appear until 14.2.3.  Users who have upgraded to Nautilus (or are 
considering upgrading) may want to delay provisioning new OSDs until the 
fix is available in the next release.

This issue will only affect you if:

- You started with a pre-nautilus cluster and upgraded
- You then provision one or more new BlueStore OSDs, or run 
  'ceph-bluestore-tool repair' on an upgraded OSD.

The symptom is that the pool stats from 'ceph df' are too small.  For 
example, the pre-upgrade stats on our test cluster were

    POOL                           ID      STORED      OBJECTS     USED        %USED     MAX AVAIL 
    data                             0      63 TiB      44.59M      63 TiB     30.21        48 TiB 

but when one OSD was updated it changed to

    POOL                           ID      STORED      OBJECTS     USED        %USED     MAX AVAIL 
    data                             0     558 GiB      43.50M     1.7 TiB      1.22        45 TiB 

The root cause is that, starting with Nautilus, BlueStore maintains 
per-pool usage stats, but it requires a slight on-disk format change; 
upgraded OSDs won't have the new stats until you run a ceph-bluestore-tool 
repair.  The problem is that the mon starts using the new stats as soon as 
*any* OSDs are reporting per-pool stats (instead of waiting until *all* 
OSDs are doing so).

To avoid the issue, either

 - do not provision new BlueStore OSDs after the upgrade, or
 - update all OSDs to keep new per-pool stats.  An existing BlueStore
   OSD can be converted with

     systemctl stop ceph-osd@$N
     ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-$N
     systemctl start ceph-osd@$N

   Note that FileStore does not support the new per-pool stats at all, so 
   if there are filestore OSDs in your cluster there is no workaround  
   that doesn't involve replacing the filestore OSDs with bluestore.

A fix[1] is working it's way through QA and will appear in 14.2.3; it 
won't quite make the 14.2.2 release.


[1] https://github.com/ceph/ceph/pull/28978

More information about the ceph-users mailing list