[ceph-users] bluestore assertion happening mostly on cachetier SSDs with external WAL/DB nvme

Eric Nelson ericnelson at gmail.com
Tue Nov 7 16:58:15 PST 2017


Hi all,

This list has been so great a resource for myself in the past few years
using ceph that first off I want to say thanks. This is the first I've
needed to post but have gained a ton of insight/experience reading
responses from you helpful people.

We've been running Luminous for about a month now on a new cluster doing
just object storage and are in the process of rolling out bluestore to some
of the existing cluster filestore backed OSDs. Anyhow, we've seen a few
issues with crashes on our cachetier bluestore OSDs which makes them get
into a bad state where I'm unable to bring them back. Early on we had some
crashes due to the jemalloc bug which I had thought may be related but
there have recently been a few OSDs that got into this state that never had
jemalloc (I forgot to add it in our chef cookbook that deploys these nodes)

I filed a bug here, but in the meantime I was curious if anyone else has
run into this error. This is 12.2.1 / ubuntu xenial

http://tracker.ceph.com/issues/22066

2017-11-07 05:07:40.193162 7f569f67b700 0 log_channel(cluster) log [WRN] :
slow request 30.818396 seconds old, received at 2017-11-07 05:07:09.374162:
osd_op(client.32411500.0:704920 16.4
16:20c9bc55:::.dir.82b78eeb-9530-43f7-92a8-a79b5c
df1e04.21552069.1.261:head [call rgw.guard_bucket_resharding,call
rgw.bucket_prepare_op] snapc 0=[] ondisk+write+known_if_redirected e52947)
currently sub_op_commit_rec from 36
2017-11-07 05:07:40.672879 7f5695e68700 4 rocksdb:
[/build/ceph-12.2.1/src/rocksdb/db/db_impl_write.cc:684] reusing log 57888
from recycle list

2017-11-07 05:07:40.672990 7f5695e68700 4 rocksdb:
[/build/ceph-12.2.1/src/rocksdb/db/db_impl_write.cc:725] [default] New
memtable created with log file: #58654. Immutable memtables: 0.

2017-11-07 05:07:40.673110 7f5695e68700 1 bluefs _allocate failed to
allocate 0x100000 on bdev 0, free 0xff000; fallback to bdev 1
2017-11-07 05:07:40.911889 7f567e639700 4 rocksdb: (Original Log Time
2017/11/07-05:07:40.678363)
[/build/ceph-12.2.1/src/rocksdb/db/db_impl_compaction_flush.cc:1158]
Calling FlushMemTableToOutputFile with column family [default], flush
slots available 1, compaction slots allowed 1, compaction slots scheduled 1
2017-11-07 05:07:40.911954 7f567e639700 4 rocksdb:
[/build/ceph-12.2.1/src/rocksdb/db/flush_job.cc:264] [default] [JOB 10082]
Flushing memtable with next log file: 58654

2017-11-07 05:07:40.912077 7f567e639700 4 rocksdb: EVENT_LOG_v1
{"time_micros": 1510031260911964, "job": 10082, "event": "flush_started",
"num_memtables": 1, "num_entries": 17207993, "num_deletes": 16854639,
"memory_usage": 3244877424}
2017-11-07 05:07:40.912083 7f567e639700 4 rocksdb:
[/build/ceph-12.2.1/src/rocksdb/db/flush_job.cc:293] [default] [JOB 10082]
Level-0 flush table #58655: started
2017-11-07 05:07:41.193369 7f569f67b700 0 log_channel(cluster) log [WRN] :
6 slow requests, 5 included below; oldest blocked for > 36.431733 secs
2017-11-07 05:07:41.193380 7f569f67b700 0 log_channel(cluster) log [WRN] :
slow request 30.801090 seconds old, received at 2017-11-07 05:07:10.392216:
osd_op(client.32814103.0:158479 16.8
16:10f90c19:::.dir.82b78eeb-9530-43f7-92a8-a79b5c
df1e04.21552069.1.132:head [call rgw.guard_bucket_resharding,call
rgw.bucket_prepare_op] snapc 0=[] ondisk+write+known_if_redirected e52947)
currently sub_op_commit_rec from 36
2017-11-07 05:07:41.193403 7f569f67b700 0 log_channel(cluster) log [WRN] :
slow request 36.431733 seconds old, received at 2017-11-07 05:07:04.761574:
osd_op(client.32815468.0:1795982 16.4
16:2055d324:::.dir.82b78eeb-9530-43f7-92a8-a79b5
cdf1e04.21723796.1.98:head [call rgw.guard_bucket_resharding,call
rgw.bucket_prepare_op] snapc 0=[] ondisk+write+known_if_redirected e52947)
currently sub_op_commit_rec from 36
2017-11-07 05:07:41.193406 7f569f67b700 0 log_channel(cluster) log [WRN] :
slow request 32.197643 seconds old, received at 2017-11-07 05:07:08.995664:
osd_op(client.32324126.0:723025 16.5b
16:dbf7165f:::.dir.82b78eeb-9530-43f7-92a8-a79b5
cdf1e04.21552069.1.1021:head [call rgw.guard_bucket_resharding,call
rgw.bucket_prepare_op] snapc 0=[] ondisk+write+known_if_redirected e52947)
currently sub_op_commit_rec from 35
2017-11-07 05:07:41.193410 7f569f67b700 0 log_channel(cluster) log [WRN] :
slow request 31.876367 seconds old, received at 2017-11-07 05:07:09.316940:
osd_op(client.32411449.0:1647283 16.5b
16:dab27a8b:::.dir.82b78eeb-9530-43f7-92a8-a79b5cdf1e04.21552069.1.452:head
[call rgw.guard_bucket_resharding,call rgw.bucket_prepare_op] snapc 0=[]
ondisk+write+known_if_redirected e52947) currently sub_op_commit_rec from 35
2017-11-07 05:07:41.193414 7f569f67b700 0 log_channel(cluster) log [WRN] :
slow request 30.220324 seconds old, received at 2017-11-07 05:07:10.972983:
osd_op(client.32661681.0:159961 16.5f
16:fbe053dc:::.dir.82b78eeb-9530-43f7-92a8-a79b5cdf1e04.21552069.1.1016:head
[call rgw.guard_bucket_resharding,call rgw.bucket_prepare_op] snapc 0=[]
ondisk+write+known_if_redirected e52947) currently sub_op_commit_rec from 30
2017-11-07 05:07:46.194177 7f569f67b700 0 log_channel(cluster) log [WRN] :
1 slow requests, 1 included below; oldest blocked for > 30.814295 secs
2017-11-07 05:07:46.194186 7f569f67b700 0 log_channel(cluster) log [WRN] :
slow request 30.814295 seconds old, received at 2017-11-07 05:07:15.379818:
osd_op(client.32814103.0:158503 25.377
25:eedc0af4:::82b78eeb-9530-43f7-92a8-a79b5cdf1e04.1544167.4_melon_moderation%2f998489968764662.LKPUWVcpM1yjdrdp2FNEpHmLjLfCccn2Ho6MBuBzLdVzcXdDrYhjtJgSo8yvv9XJXtYxtKE8bmvzRkFZEdWsaGitujN9MsPSHH5s28GAvRaNcoqvKTCnWswv2mFh4uA1PhZ.9884577.jpg:head
[create,setxattr user.rgw.idtag (53),setxattr user.rgw.tail_tag
(53),writefull 0~13797,setxattr user.rgw.manifest (609),setxattr
user.rgw.acl (201),setxattr user.rgw.content_type (11),setxattr
user.rgw.etag (33),setxattr user.rgw.x-amz-acl (8),setxattr
user.rgw.x-amz-content-sha256 (65),setxattr user.rgw.x-amz-date (17),call
rgw.obj_store_pg_ver,setxattr user.rgw.source_zone (4)] snapc 0=[]
ondisk+write+known_if_redirected e52949) currently waiting for blocked
object
2017-11-07 05:07:47.194495 7f569f67b700 0 log_channel(cluster) log [WRN] :
2 slow requests, 1 included below; oldest blocked for > 31.814620 secs
2017-11-07 05:07:47.194601 7f569f67b700 0 log_channel(cluster) log [WRN] :
slow request 30.749378 seconds old, received at 2017-11-07 05:07:16.445060:
osd_op(client.32324117.0:1357608 25.1af
25:f5963916:::82b78eeb-9530-43f7-92a8-a79b5cdf1e04.1544167.4_melon_moderation%2f998489968763621.m7oZhQcTjnhyAL82D9Bd6tCVLmpKhmYXNMZNpvWz8sBc4QuHEeK1xvbxp9JV45SfZHsN1iF7d55JsEtUwZjv9wLW28H4n5QhBjanATGbLFBGqky2xyrpf3Bv48Dm7C2QTYA.10184398.jpg:head
[create,setxattr user.rgw.idtag (54),setxattr user.rgw.tail_tag
(54),writefull 0~24398,setxattr user.rgw.manifest (610),setxattr
user.rgw.acl (201),setxattr user.rgw.content_type (11),setxattr
user.rgw.etag (33),setxattr user.rgw.x-amz-acl (8),setxattr
user.rgw.x-amz-content-sha256 (65),setxattr user.rgw.x-amz-date (17),call
rgw.obj_store_pg_ver,setxattr user.rgw.source_zone (4)] snapc 0=[]
ondisk+write+known_if_redirected e52949) currently waiting for blocked
object
2017-11-07 05:07:57.314405 7f567e639700 -1
/build/ceph-12.2.1/src/include/buffer.h: In function 'void
ceph::buffer::list::prepare_iov(VectorT*) const [with VectorT =
boost::container::small_vector<iovec, 4ul>]' thread 7f567e639700 time
2017-11-07 05:07:57.235916
/build/ceph-12.2.1/src/include/buffer.h: 882: FAILED assert(_buffers.size()
<= 1024)

As mentioned in the bug, it seems similar to
http://tracker.ceph.com/issues/21932 however some of these OSDs never
crashed until this started happening.

At this point I'm pretty stumped as to how best to move forward. Since this
is only happening on our cache tier I have been blowing away the OSDs and
re-adding since recovery is fairly quick. As mentioned in the bug I am
worried we may see this on the backend OSD tier where recovery is much more
disruptive, so I'd like to find a workaround.

Relevant ceph.conf entries:

bluestore default buffered read = true

enable experimental unrecoverable data corrupting features = bluestore
rocksdb

bluestore_block_db size = 14336

bluestore_block_wal size = 14336

bluestore_block_size = 0

osd objectstore = bluestore

debug_bluestore = "4/5"

rgw_dynamic_resharding=false

bluestore_cache_size = 0

bluestore_cache_size_hdd = 1073741824

bluestore_cache_size_ssd = 3221225472


Thanks,
E
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20171107/b93f72ee/attachment.html>


More information about the ceph-users mailing list