[ceph-users] CEPH OSD Restarts taking too long v10.2.9

Nikhil R nikh.ravindra at gmail.com
Fri Mar 29 07:13:24 PDT 2019


We have maxed out the files per dir. CEPH is trying to do an online split
due to which osd's are crashing. We increased the split_multiple and
merge_threshold for now and are restarting osd's. Now on these restarts the
leveldb compaction is taking a long time. Below are some of the logs.

2019-03-29 06:25:37.082055 7f3c6320a8c0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features: FIEMAP
ioctl is disabled via 'filestore fiemap' config option
2019-03-29 06:25:37.082064 7f3c6320a8c0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features:
SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
2019-03-29 06:25:37.082079 7f3c6320a8c0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features: splice
is supported
2019-03-29 06:25:37.096658 7f3c6320a8c0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features:
syncfs(2) syscall fully supported (by glibc and kernel)
2019-03-29 06:25:37.096703 7f3c6320a8c0  0
xfsfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_feature: extsize is
disabled by conf
2019-03-29 06:25:37.295577 7f3c6320a8c0  1 leveldb: Recovering log #1151738
2019-03-29 06:25:37.445516 7f3c6320a8c0  1 leveldb: Delete type=0 #1151738
2019-03-29 06:25:37.445574 7f3c6320a8c0  1 leveldb: Delete type=3 #1151737
2019-03-29 07:11:50.619313 7ff6c792b700  1 leveldb: Compacting 1 at 3 + 12 at 4
files
2019-03-29 07:11:50.639795 7ff6c792b700  1 leveldb: Generated table
#1029200: 7805 keys, 2141956 bytes
2019-03-29 07:11:50.649315 7ff6c792b700  1 leveldb: Generated table
#1029201: 4464 keys, 1220994 bytes
2019-03-29 07:11:50.660485 7ff6c792b700  1 leveldb: Generated table
#1029202: 7813 keys, 2142882 bytes
2019-03-29 07:11:50.672235 7ff6c792b700  1 leveldb: Generated table
#1029203: 6283 keys, 1712810 bytes
2019-03-29 07:11:50.697949 7ff6c792b700  1 leveldb: Generated table
#1029204: 7805 keys, 2142841 bytes
2019-03-29 07:11:50.714648 7ff6c792b700  1 leveldb: Generated table
#1029205: 5173 keys, 1428905 bytes
2019-03-29 07:11:50.757146 7ff6c792b700  1 leveldb: Generated table
#1029206: 7888 keys, 2143304 bytes
2019-03-29 07:11:50.774357 7ff6c792b700  1 leveldb: Generated table
#1029207: 5168 keys, 1425634 bytes
2019-03-29 07:11:50.830276 7ff6c792b700  1 leveldb: Generated table
#1029208: 7821 keys, 2146114 bytes
2019-03-29 07:11:50.849116 7ff6c792b700  1 leveldb: Generated table
#1029209: 6106 keys, 1680947 bytes
2019-03-29 07:11:50.909866 7ff6c792b700  1 leveldb: Generated table
#1029210: 7799 keys, 2142782 bytes
2019-03-29 07:11:50.921143 7ff6c792b700  1 leveldb: Generated table
#1029211: 5737 keys, 1574963 bytes
2019-03-29 07:11:50.923357 7ff6c792b700  1 leveldb: Generated table
#1029212: 1149 keys, 310202 bytes
2019-03-29 07:11:50.923388 7ff6c792b700  1 leveldb: Compacted 1 at 3 + 12 at 4
files => 22214334 bytes
2019-03-29 07:11:50.924224 7ff6c792b700  1 leveldb: compacted to: files[ 0
3 54 715 6304 24079 0 ]
2019-03-29 07:11:50.942586 7ff6c792b700  1 leveldb: Delete type=2 #1029109

Is there a way i can skip this?

in.linkedin.com/in/nikhilravindra



On Fri, Mar 29, 2019 at 11:32 AM huang jun <hjwsm1989 at gmail.com> wrote:

> Nikhil R <nikh.ravindra at gmail.com> 于2019年3月29日周五 下午1:44写道:
> >
> > if i comment filestore_split_multiple = 72 filestore_merge_threshold =
> 480   in the ceph.conf wont ceph take the default value of 2 and 10 and we
> would be in more splits and crashes?
> >
> Yes, that aimed to make it clear what results in the long start time,
> leveldb compact or filestore split?
> > in.linkedin.com/in/nikhilravindra
> >
> >
> >
> > On Fri, Mar 29, 2019 at 6:55 AM huang jun <hjwsm1989 at gmail.com> wrote:
> >>
> >> It seems like the split settings result the problem,
> >> what about comment out those settings then see it still used that long
> >> time to restart?
> >> As a fast search in code, these two
> >> filestore_split_multiple = 72
> >> filestore_merge_threshold = 480
> >> doesn't support online change.
> >>
> >> Nikhil R <nikh.ravindra at gmail.com> 于2019年3月28日周四 下午6:33写道:
> >> >
> >> > Thanks huang for the reply.
> >> > Its is the disk compaction taking more time
> >> > the disk i/o is completely utilized upto 100%
> >> > looks like both osd_compact_leveldb_on_mount = false &
> leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9
> >> > is there a way to turn off compaction?
> >> >
> >> > Also, the reason why we are restarting osd's is due to splitting and
> we increased split multiple and merge_threshold.
> >> > Is there a way we would inject it? Is osd restarts the only solution?
> >> >
> >> > Thanks In Advance
> >> >
> >> > in.linkedin.com/in/nikhilravindra
> >> >
> >> >
> >> >
> >> > On Thu, Mar 28, 2019 at 3:58 PM huang jun <hjwsm1989 at gmail.com>
> wrote:
> >> >>
> >> >> Did the time really cost on db compact operation?
> >> >> or you can turn on debug_osd=20 to see what happens,
> >> >> what about the disk util during start?
> >> >>
> >> >> Nikhil R <nikh.ravindra at gmail.com> 于2019年3月28日周四 下午4:36写道:
> >> >> >
> >> >> > CEPH osd restarts are taking too long a time
> >> >> > below is my ceph.conf
> >> >> > [osd]
> >> >> > osd_compact_leveldb_on_mount = false
> >> >> > leveldb_compact_on_mount = false
> >> >> > leveldb_cache_size=1073741824
> >> >> > leveldb_compression = false
> >> >> > osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k"
> >> >> > osd_max_backfills = 1
> >> >> > osd_recovery_max_active = 1
> >> >> > osd_recovery_op_priority = 1
> >> >> > filestore_split_multiple = 72
> >> >> > filestore_merge_threshold = 480
> >> >> > osd_max_scrubs = 1
> >> >> > osd_scrub_begin_hour = 22
> >> >> > osd_scrub_end_hour = 3
> >> >> > osd_deep_scrub_interval = 2419200
> >> >> > osd_scrub_sleep = 0.1
> >> >> >
> >> >> > looks like both osd_compact_leveldb_on_mount = false &
> leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9
> >> >> >
> >> >> > Any ideas on a fix would be appreciated asap
> >> >> > in.linkedin.com/in/nikhilravindra
> >> >> >
> >> >> > _______________________________________________
> >> >> > ceph-users mailing list
> >> >> > ceph-users at lists.ceph.com
> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Thank you!
> >> >> HuangJun
> >>
> >>
> >>
> >> --
> >> Thank you!
> >> HuangJun
>
>
>
> --
> Thank you!
> HuangJun
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20190329/694e93d6/attachment.html>


More information about the ceph-users mailing list