Thank you JuanFra!

I've read the link you posted however I'm not completly clear.

As far as I know, in the case of a ping timeout, once the server is accessible again a reconnect is issued, the client starts talking to the server and lock tables will be updated, after that, operations will be carried out normally. 

Since the reconnect is happening and operation is not normal (before rw access -> after ro access) we go to the next point: ext4.

My bricks are formatted with xfs, however when the partition is mounted by cinder on the instance the partition was formatted as ext4. Given that the read-only thing seems to be ext4's way to deal with disk access getting lost, could I avoid this issue if I mount the ext4 partition with 'errors=continue'? What could happen then if I format this same partition as xfs for instance, do you know?

On Thu, Sep 11, 2014 at 9:59 AM, JuanFra Rodriguez Cardoso <juanfra.rodriguez.cardoso@gmail.com> wrote:
Hi Elias:

This Joe Julian's post may help you to solve that trouble:

http://joejulian.name/blog/keeping-your-vms-from-going-read-only-when-encountering-a-ping-timeout-in-glusterfs/

Regards,
---
JuanFra Rodriguez Cardoso

2014-09-11 3:21 GMT+02:00 Elías David <elias.moreno.tec@gmail.com>:

Hello,

I'm seeing a constant behaviour with my implementation of openstack (libvirt/kvm) and cinder using glusterfs and I'm having troubles to find the real cause or if it's something not normal at all.

I have configured cinder to use glusterfs as storage backend, the volume is a replica 2 of 8 disks in 2 servers and I have several volumes attached to several instances provided by cinder. The problem is this, is not uncommon that one of the gluster servers reboot suddenly due to power failures (this is an infrastructure problem unavoidable right now), when this happens the instances start to see the attached volume as read only which force me to hard reboot the instance so it can access the volume normally again.

Here are my doubts, the gluster volume is created in such a way that not a single replica is on the same server as the master, if I lose a server due to hardware failure, the other is still usable so I don't really understand why couldn't the instances just use the replica brick in case that one of the servers reboots.

Also, why the data is still there, can be read but can't be written to in case of glusterfs failures? Is this a problem with my implementation? configuration error on my part? something known to openstack? a cinder thing? libvirt? glusterfs?

Having to hard reboot the instances is not a big issue right now, but nevertheless I want to understand what's happening and if I can avoid this issue.

Some specifics:

GlusterFS version is 3.5 All systems are CentOS 6.5 Openstack version is Icehouse installed with packstack/rdo

Thanks in advance!


--
Elías David.

_______________________________________________
Rdo-list mailing list
Rdo-list@redhat.com
https://www.redhat.com/mailman/listinfo/rdo-list





--
Elías David.