[ceph-users] ceph-deploy failed to deploy osd randomly

Alfredo Deza adeza at redhat.com
Thu Nov 16 05:08:47 PST 2017


On Wed, Nov 15, 2017 at 8:31 AM, Wei Jin <wjin.cn at gmail.com> wrote:
> I tried to do purge/purgedata and then redo the deploy command for a
> few times, and it still fails to start osd.
> And there is no error log, anyone know what's the problem?

Seems like this is OSD 0, right? Have you checked the startup errors
on /var/log/ceph/ ? Or by checking the output of the daemon with
systemctl?

If nothing is working still, maybe try running the OSD in the
foreground with (assuming OSD 0):

    /usr/bin/ceph-osd --debug_osd 20 -d -f --cluster ceph --id 0
--setuser ceph --setgroup ceph

Behind the scenes, ceph-disk is getting these devices ready and
associated with the cluster as OSD 0, if you've tried this many times
already I am suspicious
on the same OSD id being used or drives being polluted.

Seems like you are using filestore as well, so sdb1 will probably be
your data and mounted at /var/lib/ceph/osd/ceph-0 and sdb2 your
journal, linked at /var/lib/ceph/osd/ceph-0/journal

Make sure those are mounted and linked properly.

> BTW, my os is dedian with 4.4 kernel.
> Thanks.
>
>
> On Wed, Nov 15, 2017 at 8:24 PM, Wei Jin <wjin.cn at gmail.com> wrote:
>> Hi, List,
>>
>> My machine has 12 SSDs disk, and I use ceph-deploy to deploy them. But for
>> some machine/disks,it failed to start osd.
>> I tried many times, some success but others failed. But there is no error
>> info.
>> Following is ceph-deploy log for one disk:
>>
>>
>> root at n10-075-012:~# ceph-deploy osd create --zap-disk n10-075-094:sdb:sdb
>> [ceph_deploy.conf][DEBUG ] found configuration file at:
>> /root/.cephdeploy.conf
>> [ceph_deploy.cli][INFO  ] Invoked (1.5.39): /usr/bin/ceph-deploy osd create
>> --zap-disk n10-075-094:sdb:sdb
>> [ceph_deploy.cli][INFO  ] ceph-deploy options:
>> [ceph_deploy.cli][INFO  ]  username                      : None
>> [ceph_deploy.cli][INFO  ]  block_db                      : None
>> [ceph_deploy.cli][INFO  ]  disk                          : [('n10-075-094',
>> '/dev/sdb', '/dev/sdb')]
>> [ceph_deploy.cli][INFO  ]  dmcrypt                       : False
>> [ceph_deploy.cli][INFO  ]  verbose                       : False
>> [ceph_deploy.cli][INFO  ]  bluestore                     : None
>> [ceph_deploy.cli][INFO  ]  block_wal                     : None
>> [ceph_deploy.cli][INFO  ]  overwrite_conf                : False
>> [ceph_deploy.cli][INFO  ]  subcommand                    : create
>> [ceph_deploy.cli][INFO  ]  dmcrypt_key_dir               :
>> /etc/ceph/dmcrypt-keys
>> [ceph_deploy.cli][INFO  ]  quiet                         : False
>> [ceph_deploy.cli][INFO  ]  cd_conf                       :
>> <ceph_deploy.conf.cephdeploy.Conf object at 0x7f566b82a110>
>> [ceph_deploy.cli][INFO  ]  cluster                       : ceph
>> [ceph_deploy.cli][INFO  ]  fs_type                       : xfs
>> [ceph_deploy.cli][INFO  ]  filestore                     : None
>> [ceph_deploy.cli][INFO  ]  func                          : <function osd at
>> 0x7f566ae9a938>
>> [ceph_deploy.cli][INFO  ]  ceph_conf                     : None
>> [ceph_deploy.cli][INFO  ]  default_release               : False
>> [ceph_deploy.cli][INFO  ]  zap_disk                      : True
>> [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks
>> n10-075-094:/dev/sdb:/dev/sdb
>> [n10-075-094][DEBUG ] connected to host: n10-075-094
>> [n10-075-094][DEBUG ] detect platform information from remote host
>> [n10-075-094][DEBUG ] detect machine type
>> [n10-075-094][DEBUG ] find the location of an executable
>> [ceph_deploy.osd][INFO  ] Distro info: debian 8.9 jessie
>> [ceph_deploy.osd][DEBUG ] Deploying osd to n10-075-094
>> [n10-075-094][DEBUG ] write cluster configuration to
>> /etc/ceph/{cluster}.conf
>> [ceph_deploy.osd][DEBUG ] Preparing host n10-075-094 disk /dev/sdb journal
>> /dev/sdb activate True
>> [n10-075-094][DEBUG ] find the location of an executable
>> [n10-075-094][INFO  ] Running command: /usr/sbin/ceph-disk -v prepare
>> --zap-disk --cluster ceph --fs-type xfs -- /dev/sdb /dev/sdb
>> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd
>> --cluster=ceph --show-config-value=fsid
>> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd
>> --check-allows-journal -i 0 --log-file $run_dir/$cluster-osd-check.log
>> --cluster ceph --setuser ceph --setgroup ceph
>> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd
>> --check-wants-journal -i 0 --log-file $run_dir/$cluster-osd-check.log
>> --cluster ceph --setuser ceph --setgroup ceph
>> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd
>> --check-needs-journal -i 0 --log-file $run_dir/$cluster-osd-check.log
>> --cluster ceph --setuser ceph --setgroup ceph
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-osd
>> --cluster=ceph --show-config-value=osd_journal_size
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb1 uuid path is
>> /sys/dev/block/8:17/dm/uuid
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb2 uuid path is
>> /sys/dev/block/8:18/dm/uuid
>> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-conf
>> --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs
>> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-conf
>> --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs
>> [n10-075-094][WARNIN] command: Running command: /usr/bin/ceph-conf
>> --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] zap: Zapping partition table on /dev/sdb
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk
>> --zap-all -- /dev/sdb
>> [n10-075-094][WARNIN] Caution: invalid backup GPT header, but valid main
>> header; regenerating
>> [n10-075-094][WARNIN] backup header from main header.
>> [n10-075-094][WARNIN]
>> [n10-075-094][WARNIN] Warning! Main and backup partition tables differ! Use
>> the 'c' and 'e' options
>> [n10-075-094][WARNIN] on the recovery & transformation menu to examine the
>> two tables.
>> [n10-075-094][WARNIN]
>> [n10-075-094][WARNIN] Warning! One or more CRCs don't match. You should
>> repair the disk!
>> [n10-075-094][WARNIN]
>> [n10-075-094][DEBUG ]
>> ****************************************************************************
>> [n10-075-094][DEBUG ] Caution: Found protective or hybrid MBR and corrupt
>> GPT. Using GPT, but disk
>> [n10-075-094][DEBUG ] verification and recovery are STRONGLY recommended.
>> [n10-075-094][DEBUG ]
>> ****************************************************************************
>> [n10-075-094][DEBUG ] GPT data structures destroyed! You may now partition
>> the disk using fdisk or
>> [n10-075-094][DEBUG ] other utilities.
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk
>> --clear --mbrtogpt -- /dev/sdb
>> [n10-075-094][DEBUG ] Creating new GPT entries.
>> [n10-075-094][DEBUG ] The operation has completed successfully.
>> [n10-075-094][WARNIN] update_partition: Calling partprobe on zapped device
>> /dev/sdb
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm
>> settle --timeout=600
>> [n10-075-094][WARNIN] command: Running command: /usr/bin/flock -s /dev/sdb
>> /sbin/partprobe /dev/sdb
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm
>> settle --timeout=600
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] ptype_tobe_for_name: name = journal
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] create_partition: Creating journal partition num 2
>> size 40960 on /dev/sdb
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk
>> --new=2:0:+40960M --change-name=2:ceph journal
>> --partition-guid=2:b7f01f38-f0d5-45ba-a913-ac7242820aed
>> --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdb
>> [n10-075-094][DEBUG ] Setting name!
>> [n10-075-094][DEBUG ] partNum is 1
>> [n10-075-094][DEBUG ] REALLY setting name!
>> [n10-075-094][DEBUG ] The operation has completed successfully.
>> [n10-075-094][WARNIN] update_partition: Calling partprobe on created device
>> /dev/sdb
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm
>> settle --timeout=600
>> [n10-075-094][WARNIN] command: Running command: /usr/bin/flock -s /dev/sdb
>> /sbin/partprobe /dev/sdb
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm
>> settle --timeout=600
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb2 uuid path is
>> /sys/dev/block/8:18/dm/uuid
>> [n10-075-094][WARNIN] prepare_device: Journal is GPT partition
>> /dev/disk/by-partuuid/b7f01f38-f0d5-45ba-a913-ac7242820aed
>> [n10-075-094][WARNIN] prepare_device: Journal is GPT partition
>> /dev/disk/by-partuuid/b7f01f38-f0d5-45ba-a913-ac7242820aed
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] set_data_partition: Creating osd partition on /dev/sdb
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] ptype_tobe_for_name: name = data
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] create_partition: Creating data partition num 1 size 0
>> on /dev/sdb
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk
>> --largest-new=1 --change-name=1:ceph data
>> --partition-guid=1:6e984e11-1b4b-4741-9080-131f13a73daa
>> --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be --mbrtogpt -- /dev/sdb
>> [n10-075-094][DEBUG ] Setting name!
>> [n10-075-094][DEBUG ] partNum is 0
>> [n10-075-094][DEBUG ] REALLY setting name!
>> [n10-075-094][DEBUG ] The operation has completed successfully.
>> [n10-075-094][WARNIN] update_partition: Calling partprobe on created device
>> /dev/sdb
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm
>> settle --timeout=600
>> [n10-075-094][WARNIN] command: Running command: /usr/bin/flock -s /dev/sdb
>> /sbin/partprobe /dev/sdb
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm
>> settle --timeout=600
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb1 uuid path is
>> /sys/dev/block/8:17/dm/uuid
>> [n10-075-094][WARNIN] populate_data_path_device: Creating xfs fs on
>> /dev/sdb1
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/mkfs -t xfs
>> -f -i size=2048 -- /dev/sdb1
>> [n10-075-094][DEBUG ] meta-data=/dev/sdb1              isize=2048
>> agcount=4, agsize=55984277 blks
>> [n10-075-094][DEBUG ]          =                       sectsz=4096  attr=2,
>> projid32bit=1
>> [n10-075-094][DEBUG ]          =                       crc=0        finobt=0
>> [n10-075-094][DEBUG ] data     =                       bsize=4096
>> blocks=223937105, imaxpct=25
>> [n10-075-094][DEBUG ]          =                       sunit=0      swidth=0
>> blks
>> [n10-075-094][DEBUG ] naming   =version 2              bsize=4096
>> ascii-ci=0 ftype=0
>> [n10-075-094][DEBUG ] log      =internal log           bsize=4096
>> blocks=109344, version=2
>> [n10-075-094][DEBUG ]          =                       sectsz=4096  sunit=1
>> blks, lazy-count=1
>> [n10-075-094][DEBUG ] realtime =none                   extsz=4096
>> blocks=0, rtextents=0
>> [n10-075-094][WARNIN] mount: Mounting /dev/sdb1 on
>> /var/lib/ceph/tmp/mnt.N8D5Kd with options
>> rw,noexec,noatime,attr2,inode64,logbufs=8,logbsize=256k,noquota
>> [n10-075-094][WARNIN] command_check_call: Running command: /bin/mount -t xfs
>> -o rw,noexec,noatime,attr2,inode64,logbufs=8,logbsize=256k,noquota --
>> /dev/sdb1 /var/lib/ceph/tmp/mnt.N8D5Kd
>> [n10-075-094][WARNIN] populate_data_path: Preparing osd data dir
>> /var/lib/ceph/tmp/mnt.N8D5Kd
>> [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph
>> /var/lib/ceph/tmp/mnt.N8D5Kd/ceph_fsid.11531.tmp
>> [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph
>> /var/lib/ceph/tmp/mnt.N8D5Kd/fsid.11531.tmp
>> [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph
>> /var/lib/ceph/tmp/mnt.N8D5Kd/magic.11531.tmp
>> [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph
>> /var/lib/ceph/tmp/mnt.N8D5Kd/journal_uuid.11531.tmp
>> [n10-075-094][WARNIN] adjust_symlink: Creating symlink
>> /var/lib/ceph/tmp/mnt.N8D5Kd/journal ->
>> /dev/disk/by-partuuid/b7f01f38-f0d5-45ba-a913-ac7242820aed
>> [n10-075-094][WARNIN] command: Running command: /bin/chown -R ceph:ceph
>> /var/lib/ceph/tmp/mnt.N8D5Kd
>> [n10-075-094][WARNIN] unmount: Unmounting /var/lib/ceph/tmp/mnt.N8D5Kd
>> [n10-075-094][WARNIN] command_check_call: Running command: /bin/umount --
>> /var/lib/ceph/tmp/mnt.N8D5Kd
>> [n10-075-094][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb uuid path is
>> /sys/dev/block/8:16/dm/uuid
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/sgdisk
>> --typecode=1:4fbd7e29-9d25-41b8-afd0-062c0ceff05d -- /dev/sdb
>> [n10-075-094][DEBUG ] Warning: The kernel is still using the old partition
>> table.
>> [n10-075-094][DEBUG ] The new table will be used at the next reboot.
>> [n10-075-094][DEBUG ] The operation has completed successfully.
>> [n10-075-094][WARNIN] update_partition: Calling partprobe on prepared device
>> /dev/sdb
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm
>> settle --timeout=600
>> [n10-075-094][WARNIN] command: Running command: /usr/bin/flock -s /dev/sdb
>> /sbin/partprobe /dev/sdb
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm
>> settle --timeout=600
>> [n10-075-094][WARNIN] command_check_call: Running command: /sbin/udevadm
>> trigger --action=add --sysname-match sdb1
>> [n10-075-094][INFO  ] Running command: systemctl enable ceph.target
>> [n10-075-094][INFO  ] checking OSD status...
>> [n10-075-094][DEBUG ] find the location of an executable
>> [n10-075-094][INFO  ] Running command: /usr/bin/ceph --cluster=ceph osd stat
>> --format=json
>> [ceph_deploy.osd][DEBUG ] Host n10-075-094 is now ready for osd use.
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


More information about the ceph-users mailing list