[ceph-users] mount cephfs on ceph servers

Hector Martin hector at marcansoft.com
Tue Mar 12 02:06:33 PDT 2019


It's worth noting that most containerized deployments can effectively 
limit RAM for containers (cgroups), and the kernel has limits on how 
many dirty pages it can keep around.

In particular, /proc/sys/vm/dirty_ratio (default: 20) means at most 20% 
of your total RAM can be dirty FS pages. If you set up your containers 
such that the cumulative memory usage is capped below, say, 70% of RAM, 
then this might effectively guarantee that you will never hit this issue.

On 08/03/2019 02:17, Tony Lill wrote:
> AFAIR the issue is that under memory pressure, the kernel will ask
> cephfs to flush pages, but that this in turn causes the osd (mds?) to
> require more memory to complete the flush (for network buffers, etc). As
> long as cephfs and the OSDs are feeding from the same kernel mempool,
> you are susceptible. Containers don't protect you, but a full VM, like
> xen or kvm? would.
> 
> So if you don't hit the low memory situation, you will not see the
> deadlock, and you can run like this for years without a problem. I have.
> But you are most likely to run out of memory during recovery, so this
> could compound your problems.
> 
> On 3/7/19 3:56 AM, Marc Roos wrote:
>>   
>>
>> Container =  same kernel, problem is with processes using the same
>> kernel.
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Daniele Riccucci [mailto:devster at posteo.net]
>> Sent: 07 March 2019 00:18
>> To: ceph-users at lists.ceph.com
>> Subject: Re: [ceph-users] mount cephfs on ceph servers
>>
>> Hello,
>> is the deadlock risk still an issue in containerized deployments? For
>> example with OSD daemons in containers and mounting the filesystem on
>> the host machine?
>> Thank you.
>>
>> Daniele
>>
>> On 06/03/19 16:40, Jake Grimmett wrote:
>>> Just to add "+1" on this datapoint, based on one month usage on Mimic
>>> 13.2.4 essentially "it works great for us"
>>>
>>> Prior to this, we had issues with the kernel driver on 12.2.2. This
>>> could have been due to limited RAM on the osd nodes (128GB / 45 OSD),
>>> and an older kernel.
>>>
>>> Upgrading the RAM to 256GB and using a RHEL 7.6 derived kernel has
>>> allowed us to reliably use the kernel driver.
>>>
>>> We keep 30 snapshots ( one per day), have one active metadata server,
>>> and change several TB daily - it's much, *much* faster than with fuse.
>>>
>>> Cluster has 10 OSD nodes, currently storing 2PB, using ec 8:2 coding.
>>>
>>> ta ta
>>>
>>> Jake
>>>
>>>
>>>
>>>
>>> On 3/6/19 11:10 AM, Hector Martin wrote:
>>>> On 06/03/2019 12:07, Zhenshi Zhou wrote:
>>>>> Hi,
>>>>>
>>>>> I'm gonna mount cephfs from my ceph servers for some reason,
>>>>> including monitors, metadata servers and osd servers. I know it's
>>>>> not a best practice. But what is the exact potential danger if I
>>>>> mount cephfs from its own server?
>>>>
>>>> As a datapoint, I have been doing this on two machines (single-host
>>>> Ceph
>>>> clusters) for months with no ill effects. The FUSE client performs a
>>>> lot worse than the kernel client, so I switched to the latter, and
>>>> it's been working well with no deadlocks.
>>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users at lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Hector Martin (hector at marcansoft.com)
Public Key: https://mrcn.st/pub


More information about the ceph-users mailing list