[ceph-users] unusual growth in cluster after replacing journalSSDs

Burkhard Linke Burkhard.Linke at computational.bio.uni-giessen.de
Thu Nov 16 04:44:46 PST 2017


On 11/16/2017 01:36 PM, Jogi Hofmüller wrote:
> Dear all,
> for about a month we experience something strange in our small cluster.
>   Let me first describe what happened on the way.
> On Oct 4ht smartmon told us that the journal SSDs in one of our two
> ceph nodes will fail.  Since getting replacements took way longer than
> expected we decided to place the journal on a spare HDD rather than
> have the SSD fail and leave us in an uncertain state.
> On Oct 17th we finally got the replacement SSDs.  First we replaced
> broken/soon to be broken SSD and moved journals from the temporarily
> used HDD to the new SSD.  Then we also replaced the journal SSD on the
> other ceph node since it would probably fail sooner than later.
> We performed all operations by setting noout first, then taking down
> the OSDs, flushing journals, replacing disks, creating new journals and
> starting OSDs again.  We waited until the cluster was back in HEALTH_OK
> state before we proceeded to the next node.
> AFAIR mkjournal crashed once on the second node.  So we ran the command
> again and journals where created.

> What remains is the growth of used data in the cluster.
> I put background information of our cluster and some graphs of
> different metrics on a wiki page:
>    https://wiki.mur.at/Dokumentation/CephCluster
> Basically we need to reduce the growth in the cluster, but since we are
> not sure what causes it we don't have an idea.

Just a wild guess (wiki page is not accessible yet):

Are you sure that the journals were creating on the new SSD? If the 
journals were created as files in the OSD directory, their size might be 
accounted for in the cluster size report (assuming OSDs are reporting 
their free space, not a sum of all object sizes).


More information about the ceph-users mailing list