[ceph-users] HELP with some basics please

tim taler robur314 at gmail.com
Tue Dec 5 06:07:35 PST 2017


okay another day another nightmare ;-)

So far we discussed pools as bundles of:
- pool 1) 15 HDD-OSDs (consisting of a total of 25 HDDs actual, 5
single HDDs and five raid0 pairs as mentioned before)
- pool 2) 6 SSD-OSDs
unfortunately (well) on the "physical" pool 1 there are two "logical"
pools (my wording is here maybe not cephish?)

now I wonder about the real free space on "the pool"...

ceph df tells me:

GLOBAL:
    SIZE       AVAIL      RAW USED     %RAW USED
    52806G     17457G       35349G         66.94
POOLS:
    NAME                     ID     USED       %USED     MAX AVAIL     OBJECTS
    pool-1-HDD              9         995G     13.34         3232G       262134
    pool-2-HDD            10     14986G      69.86         3232G     3892481
    pool-3-SDD            12       1318G      55.94           519G      372618

Now how do I read this?
the sum of "MAX AVAIL" in the "POOLS" section is 7387
okay 7387*2 (since all three pools have a size of 2) is 14774

The GLOBAL section on the other hand tells me I still got 17457G available
17457-14774=2683
where are the missing 2683 GB?
or am I missing something (else than space and a sane setup I mean :-)

AND (!)
if in the "physical" HDD pool the reported two times 3232G available
space is true,
than in this setup (two hosts) there would be only 3232G free on each host.
Given that the HDD-OSDs are 4TB in size - if one dies and the host
tries to restore the data
(as I learned yesterday the data in this setup will ONLY be restored
on that host on which the OSD died)
than ...
it doesn't work, right?
Except I could hope that - due to too few placement groups and the resulting
miss-balance of space usage on the OSDs - the dead OSD was only filled
by 60% and not 85%
and only the real data will rewritten(restored).
But even that seems not possible - given the miss-balanced OSDs - the
fuller ones will hit total saturation
and - at least as I understand it now - after that (again after the
first OSD is filled 100%) I can't use the left
space on the other OSDs.
right?

If all that is true (and PLEASE point out any mistake in my thinking)
than I got here at the moment
25 harddisks of which NONE  must fail or the pool will at least stop
accepting writes.

Am I right? (feels like a reciprocal Russian roulette ... ONE chamber
WITHOUT a bullet ;-)

Now - sorry we are not finished yet (and yes this is true, I'm not
trying to make fun of you)

On top of all this I see a rapid decrease in the available space which
is not consistent
with growing data inside the rbds living in this cluster nore growing
numbers of rbds (we ONLY use rbds).
BUT someone is running sanpshots.
How do I sum up the amount of space each snapshot is using.

is it the sum of the USED column in the output of "rbd du --snapp" ?

And what is the philosophy of snapshots in ceph?
AN object is 4MB in size, if a bit in that object changes is the whole
object replicated?
(the cluster is luminous upgraded from jewel so we use filestore on
xfs not bluestore)

TIA

On Tue, Dec 5, 2017 at 11:10 AM, Stefan Kooman <stefan at bit.nl> wrote:
> Quoting tim taler (robur314 at gmail.com):
>> And I'm still puzzled about the implication of the cluster size on the
>> amount of OSD failures.
>> With size=2 min_size=1 one host could die and (if by chance there is
>> NO read error on any bit on the living host) I could (theoretically)
>> recover, is that right?
> True.
>> OR is it that if any two disks in the cluster fail at the same time
>> (or while one is still being rebuild) all my data would be gone?
> Only the objects that are located on those disks. So for example obj1
> disk1,host1 and obj 1 on disk2,host2 ... you will lose data, yes.
>
> Gr. Stefan
>
> --
> | BIT BV  http://www.bit.nl/        Kamer van Koophandel 09090351
> | GPG: 0xD14839C6                   +31 318 648 688 / info at bit.nl


More information about the ceph-users mailing list