[Fiware-lab-federation-nodes] [CESNET #134122] Re: experiences with HA

Theofanis Katsiaounis th_katsiaounis at neuropublic.gr
Thu Nov 19 10:50:04 CET 2015


Hi all,
searching a bit more about DVR and HA scenarios i see that i had a wrong 
point and that its actually DVR that breaks HA as Jose stated.
Moreover VRRP scenarios seem to be pretty stable inducing only some 
issues with TCP connection tracking, etc.
The general feeling i get is that DVR is not production-ready and L3 HA 
(VRRP) is in a much better state. This is supposed to get better in 
Liberty/Mitaka (as always).

I've seen some more scenarios for building HA including an Ubuntu 
suggestion of decoupling MySQL (or MariaDB)  and using a Percona Xtra DB 
cluster (or maybe Galera cluster) to HA MySQL. IMHO that sounds like a 
good idea since it will save us from the pacemaker/corosync fuss. Of 
course that means that you will be able to recover from a dead 
controller not that you have a "full" HA setup.

BR,
Fanis

On 19/11/2015 10:58 πμ, Sean Murphy wrote:
> Hi Fanis, all,
>
>     Spain has deployed HA had issues and reverted to single controller
>     in Juno. In Kilo they have deployed HA with DVR but they had
>     issues and they reverted to legacy routers (which of course
>     cancels "pure" HA).
>
>
> So, to be clearer on this: we think Spain has done a Kilo/HA 
> deployment without DVR.
>
> Having 'legacy' routers within some kind of failover mechanism still 
> looks better than having
> only one router: I know you had problems with this in the past - do 
> you know if these problems
> have been solved?
>
>     Giuseppe has deployed Kilo with HA (& DVR???) in a lab only
>     environment and it seems stable. Is the lab environment on real
>     hardware or Virtual??
>
>
> Iiuc, Guiseppe indicated that using DVR was risky and basically 
> advised against it for production.
>
>     I also think there is a confusion between DVR and the L3 agent. In
>     my opinion an L3 agent can be in HA without the routers being run
>     as DVR. The case with this setup is that something like what
>     happened to me (L3 agent failovered but did not "carry" the L3
>     router/namespace information with him) can easily happen again.
>
>
> My understanding was that this is exactly the VRRP case that was (more 
> or less) suggested in
> the confcall.
>
>     DVR creates an active/standby scenario where if a node fails a
>     router that resides on another node will just revert to Active
>     state and keep on routing the traffic.
>
>
>
>     I found loads of insightful and valuable information in this blog
>     http://assafmuller.com/. I hope we can further this discussion
>     since i think it is for the good of the project and it will
>     eventually lead to better/more stable implementations.
>
>
> I strongly agree with this - information sharing on these important 
> points is v important.
>
> BR,
> Seán.
>
>     Best regards,
>     Fanis
>
>
>
>      From:   José Ignacio Carretero
>     <joseignacio.carreteroguarde at telefonica.com
>     <mailto:joseignacio.carreteroguarde at telefonica.com>>
>      To:   Giuseppe Cossu <giuseppe.cossu at create-net.org
>     <mailto:giuseppe.cossu at create-net.org>>
>      Cc:   "fiware-lab-federation-nodes at lists.fiware.org
>     <mailto:fiware-lab-federation-nodes at lists.fiware.org>"
>     <fiware-lab-federation-nodes at lists.fiware.org
>     <mailto:fiware-lab-federation-nodes at lists.fiware.org>>, Cristian
>     Cristelotti <cristian.cristelotti.coll at trentinonetwork.it
>     <mailto:cristian.cristelotti.coll at trentinonetwork.it>>
>      Sent:   18/11/2015 12:00 PM
>      Subject:   Re: [Fiware-lab-federation-nodes] [CESNET #134122] Re:
>     experiences with HA
>
>
>      The problem with legacy routers is HA.
>
>      Regards,
>      José Ignacio.
>
>
>     El 18/11/15 a las 10:58, Giuseppe Cossu escribió:
>
>     Jose',
>     indeed the official OpenStack documentation reports that "the Kilo
>     release increases stability and reliability of DVR considerably
>     over the Juno release".
>
>
>     Anyway as you reported if the legacy routers are stable, I don't
>     see any problems using them.
>
>
>     Thanks for your feedback.
>
>
>     Regards,
>     Giuseppe
>
>
>     On Wed, Nov 18, 2015 at 10:03 AM, José Ignacio Carretero
>     <joseignacio.carreteroguarde at telefonica.com
>     <mailto:joseignacio.carreteroguarde at telefonica.com>> wrote:
>
>     Hi,
>
>      That was what we thought: DVR seemed to be a good solution for
>     HA, and this way we configured Spanish node. The fact is that it
>     didn't work and we had so many problems with DVR. I really don't
>     think this technology is mature yet.
>
>      Spain2 node is configured to use DVR routers, however we're
>     actually using Legacy routers only because Distributed routers
>     were instable.
>
>      Regards,
>      José Ignacio.
>
>
>     El 17/11/15 a las 14:25, Giuseppe Cossu escribió:
>
>
>
>     Hi all,
>     I want to share with you this link that lists the deployment
>     scenario of Neutron:
>     http://docs.openstack.org/networking-guide/deploy.html
>     As I said the main problems using HA in OpenStack were related to
>     Neutron, that's because the L3 agent was configured in
>     active/passive and it was actually not ready to be really in HA.
>     For that reason the OpenStack community has developed the DVR
>     (introduced  on Juno) that - on paper - solves many issues related
>     to Neutron. For sure it overcomes many Neutron architecture
>     limitation (performance, scalability, bottleneck of the networking
>     node).
>
>
>
>     I can confirm from my direct experience that Juno with legacy L3
>     agent is quite stable in a production environment.
>     Regarding Kilo I would suggest to use DVR - but - as Fanis stated,
>     there could be some unexpected issues... so it is up the the
>     IOwner select the wise thing to do.
>
>
>     NOTE: using Fuel 7.0 you don't have the possibility to choose
>     between with-HA/without-HA. It deploys an HA environment, so using
>     FUEL you have to manage the Corosync/Pacemaker cluster. That means
>     that also Neutron is installed in HA.
>     FUEL 7.0 have an additional option regarding the Neutron
>     installation: you can choose to use or not DVR (if you not select
>     DVR, the legacy L3 agent is used).
>
>
>     Regarding the OpenStack architecture and procedures using HA,
>     Mirantis offers a very useful documentation
>     https://docs.mirantis.com/openstack/fuel/fuel-7.0/#guides . In
>     particular regarding the HA:
>     https://docs.mirantis.com/openstack/fuel/fuel-7.0/operations.html
>     and
>     https://docs.mirantis.com/openstack/fuel/fuel-7.0/reference-architecture.html#multi-node-with-ha-deployment
>
>
>     Regards,
>     Giuseppe
>
>
>
>
>     On Tue, Nov 17, 2015 at 1:17 PM, Sean Murphy  <murp at zhaw.ch
>     <mailto:murp at zhaw.ch>> wrote:
>
>     Hi again all,
>
>
>     To follow up on this after the discussion on the confcall this
>     morning (which
>     I found v useful - it might be good if we have more discussion of
>     these
>     important issues on the calls from time to time).
>
>
>     It was not clear to me the status of the Spanish node: I did not
>     concretely
>     understand what Fernando said regarding HA. From previous
>     communication,
>     I understand that they chose not to use HA in Juno; in the
>     meetings of the
>     minutes from today, I see
>
>
>     "Migrated to Kilo, pending swift migration (waiting help from IBM)"
>
>
>
>     @Fernando - can you tell us if you went with HA in Kilo?
>
>
>     BR,
>     Seán.
>
>
>
>
>
>
>     On Mon, Nov 16, 2015 at 9:27 AM, Murphy Seán (murp) <murp at zhaw.ch
>     <mailto:murp at zhaw.ch>> wrote:
>
>
>     Hi Fede, all,
>
>
>
>
>
>
>
>
>     juno HA is quite stable in our experience. the problems are always
>     related to the neutron when you restart a
>
>
>     Good to hear.
>
>
>
>     node. so rule number one, if you need to restart, use corosynch to
>     call out your node. this will do a graceful re-balancing among l3
>     agents. in case of sudden "death" of the node, the problem is not
>     much in that, but when you re-attach the node. also in  this case
>     correct management of corosynch is the trick.
>
>
>     Thanks for the pointers - I may ask for more info on the confcall
>     as I don't fully
>     get the point here. Also, it would be good to know if this also
>     applies to Kilo.
>
>
>
>     In case you have not noticed, following the new dow in FI-CORE and
>     the Open Call, requirements on SLA and availability are quite
>     strict, so if your node dies because the only controller you have
>     is un-recoverable, and because of that you breach the required
>     availability threshold, this may have financial implications for
>     FI-CORE nodes.
>
>
>     Thanks for pointing that out. I guess everyone has a strong
>     interest in having the
>     systems as reliable as possible - unreliable systems give lots of
>     headaches. I guess
>     what I was interested in knowing is whether HA is likely to make
>     the system more
>     reliable or less reliable: the experience in XiFi was that it
>     seemed to make things
>     less reliable.
>
>
>     BR,
>     Seán.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>     Br,
>
>
>     Federico
>
>
>     --
>      Future Internet is closer than you think!
>     http://www.fiware.org
>
>      Official Mirantis partner for OpenStack Training
>     https://www.create-net.org/community/openstack-training
>
>      --
>      Dr. Federico M. Facca
>
>      CREATE-NET
>      Via alla Cascata 56/D
>      38123 Povo Trento (Italy)
>
>      P +39 0461 312471 <tel:%2B39%200461%20312471>
>      M +39 334 6049758 <tel:%2B39%20334%206049758>
>      E federico.facca at create-net.org
>     <mailto:federico.facca at create-net.org>
>      T @chicco785
>      W www.create-net.org <http://www.create-net.org>
>
>
>
>
>
>     On Fri, Nov 13, 2015 at 11:54 AM, Theofanis Katsiaounis
>     <th_katsiaounis at neuropublic.gr
>     <mailto:th_katsiaounis at neuropublic.gr>> wrote:
>
>     Hi all,
>      Indeed Kilo could solve the network issues since networking is HA
>      capable too.
>      Containers/Swift can be a problem especially since you have to leave
>      space to create the storage rings etc.
>
>      Regards,
>      Fanis
>
>      On 13/11/2015 12:50 μμ, Cristian Cristelotti wrote:
>
>
>     > Hi Sean,
>      >
>      > Our experience with Grizzly (HA) was very bad. IceHouse (HA)
>     was better but not stable . Now we are with JUNO on single-node
>     and we haven't faced any problem .
>      > We are working on the migration to KILO (HA + murano +
>     ceilometer ).
>      >
>      > KILO seems to have solved the problems mentioned by Fanis.
>      > If you'll not deploy the node with HA you'll not have
>     containers functionality or better you have to install swift
>     manually after fuel deployment.
>      >
>      >
>      >
>      > Regards
>      >
>      > Cristian
>      >
>      > ----- Messaggio originale -----
>      > Da: "Sean Murphy" <murp at zhaw.ch <mailto:murp at zhaw.ch>>
>      > A: "Theofanis Katsiaounis" <th_katsiaounis at neuropublic.gr
>     <mailto:th_katsiaounis at neuropublic.gr>>
>      > Cc: fiware-lab-federation-nodes at lists.fiware.org
>     <mailto:fiware-lab-federation-nodes at lists.fiware.org>
>      > Inviato: Venerdì, 13 novembre 2015 11:40:13
>      > Oggetto: Re: [Fiware-lab-federation-nodes] [CESNET #134122] Re:
>     experiences with HA
>      >
>      >
>      >
>      > Hi all,
>      >
>      >
>
>
>     > So the feedback so far is the following:
>      > - Riwal says that running Juno/HA is not so problematic, but
>     has not had a specific failure
>      > situation where HA could really be tested
>      > - Fernando notes that Juno/HA exhibited stability problems for
>     larger numbers of users and
>      > decided against it
>      > - Fanis notes that Icehouse/HA was quite problematic in
>     multiple respects
>      >
>      >
>      > >From our pov, this is not painting a v positive picture
>     regarding HA and despite
>      > our inclination to experiment with newer technologies we would
>     prob opt not to
>      > use HA.
>      >
>      >
>      > Does anyone in the project have Kilo/HA experience?
>      >
>      >
>      > BR,
>      > Seán.
>      >
>      >
>      >
>      >
>      >
>      >
>      > On Fri, Nov 13, 2015 at 10:38 AM, Theofanis Katsiaounis <
>     th_katsiaounis at neuropublic.gr
>     <mailto:th_katsiaounis at neuropublic.gr> > wrote:
>      >
>      >
>      >
>      > Hi all,
>      > we had HA on Icehouse and it was a mess. Especially with the
>     Networking/Neutron part. Namespaces were not transfered between
>     nodes so if one went down vm's lost networking. Reboots were a
>     lottery indeed, sometimes they worked sometimes  they did not. And
>     when we lost power once i had to rebuild the node.
>      > Of course the FIWARE lab handbook asks for an HA solution but i
>     see in the case of Spain this has already been violated ;).
>      > My two cents is that the guys from Spain made the right choice.
>     I do not think HA in openstack is ready for production especially
>     with a big number of users.
>      >
>      > Regards,
>      > Fanis
>      >
>      >
>      > On 13/11/2015 11:33 πμ, Riwal KERHERVE wrote:
>      >
>      >
>      >
>      >
>      >
>      > Sean,
>      >
>      >
>      >
>      > In Grizzly, anytime we needed to restart processes handled by
>     CRM, it was a lottery. Sometimes, everything went fine and
>     sometimes the processes keep on rebooting and it take us hours to
>     put back things in order.
>      >
>      > In Juno, we never experienced this kind of behavior. When we
>     needed to restart processes trough CRM, all always went fine.
>      >
>      >
>      >
>      > To answer to your question:
>      >
>      > The only time, we played with HA, it was to take into account
>     some modification in our configuration files. I do not recall
>     exercising HA capabilities, like the need of putting one node down
>     and switching all processes to the other node.
>      >
>      >
>      >
>      > BR
>      >
>      > Riwal
>      >
>      >
>      >
>      > De : sean at gopaddy.ch <mailto:sean at gopaddy.ch> [
>     mailto:sean at gopaddy.ch <mailto:sean at gopaddy.ch> ] De la part de
>     Sean Murphy
>      > Envoyé : jeudi 12 novembre 2015 17:01
>      > À : Riwal KERHERVE
>      > Cc : fiware-lab-federation-nodes at lists.fiware.org
>     <mailto:fiware-lab-federation-nodes at lists.fiware.org>
>      > Objet : Re: [CESNET #134122] Re: [Fiware-lab-federation-nodes]
>     experiences with HA
>      >
>      >
>      >
>      >
>      >
>      >
>      > Hi Riwal,
>      >
>      >
>      >
>      >
>      >
>      > Good feedback - thanks for that.
>      >
>      >
>      >
>      >
>      >
>      > As a matter of interest, have you ever needed to exercise any
>     of the HA
>      >
>      >
>      > capabilities or have you tested it in anger?
>      >
>      >
>      >
>      >
>      >
>      > BR,
>      >
>      >
>      > Seán.
>      >
>      >
>      >
>      >
>      >
>      > On Thu, Nov 12, 2015 at 4:51 PM, Riwal KERHERVE via RT <
>     xifi-support at rt.cesnet.cz <mailto:xifi-support at rt.cesnet.cz> > wrote:
>      >
>      > Sean,
>      >
>      > I do not have experience with Kilo in HA, but our node is in
>     Juno and in HA. We installed it with fuel 6.0 (2 controllers and 1
>     Arbitrator)
>      . We never have any trouble until now: very stable, nothing to be
>     with HA in grizzly.
>      >
>      > BR
>      > Riwal
>      >
>      > De : fiware-lab-federation-nodes-bounces at lists.fiware.org
>     <mailto:fiware-lab-federation-nodes-bounces at lists.fiware.org> [mailto:
>     fiware-lab-federation-nodes-bounces at lists.fiware.org
>     <mailto:fiware-lab-federation-nodes-bounces at lists.fiware.org> ] De
>     la part de Sean Murphy
>      > Envoyé : jeudi 12 novembre 2015 16:33
>      > À : fiware-lab-federation-nodes at lists.fiware.org
>     <mailto:fiware-lab-federation-nodes at lists.fiware.org>
>      > Objet : [Fiware-lab-federation-nodes] experiences with HA
>      >
>      >
>      >
>      >
>      > Hi all,
>      >
>      > We're looking at our upgrade strategy and we're curious to
>      > hear any experience with Kilo HA both from the deployment
>      > perspective as well as the operations perspective.
>      >
>      > >From xifi, I remember Fanis reporting a split-brain scenario
>      > with HA and in the end he opted not to go with a HA solution;
>      > this gives me pause for thought when considering this
>      > deployment solution, even though it seems to be the
>      > preferred solution.
>      >
>      > Generally, we would be well disposed to a HA deployment
>      > as we would like to learn about it, but we do not want to
>      > end up deploying a technology that is too far from production
>      > readiness.
>      >
>      > Does anyone have any experience that they can share on this
>      > point?
>      >
>      > BR,
>      > Seán.
>      >
>      >
>      >
>      >
>      >
>      > _______________________________________________
>      > Fiware-lab-federation-nodes mailing list
>     Fiware-lab-federation-nodes at lists.fiware.org
>     <mailto:Fiware-lab-federation-nodes at lists.fiware.org>
>     https://lists.fiware.org/listinfo/fiware-lab-federation-nodes
>      >
>      >
>      > Αποποίηση ευθυνών / Disclaimer
>      >
>      >
>      > _______________________________________________
>      > Fiware-lab-federation-nodes mailing list
>      > Fiware-lab-federation-nodes at lists.fiware.org
>     <mailto:Fiware-lab-federation-nodes at lists.fiware.org>
>      > https://lists.fiware.org/listinfo/fiware-lab-federation-nodes
>      >
>
>
>
>      Αποποίηση ευθυνών / Disclaimer
>
>
>
>      _______________________________________________
>      Fiware-lab-federation-nodes mailing list
>     Fiware-lab-federation-nodes at lists.fiware.org
>     <mailto:Fiware-lab-federation-nodes at lists.fiware.org>
>     https://lists.fiware.org/listinfo/fiware-lab-federation-nodes
>
>
>
>
>
>      _______________________________________________
>      Fiware-lab-federation-nodes mailing list
>     Fiware-lab-federation-nodes at lists.fiware.org
>     <mailto:Fiware-lab-federation-nodes at lists.fiware.org>
>     https://lists.fiware.org/listinfo/fiware-lab-federation-nodes
>
>
>
>
>
>      --
>
>
>
>
>     --------------------------------------------------------
>      Giuseppe Cossu
>      CREATE-NET
>      Smart Infrastructures
>      Research Engineer
>      Via alla Cascata 56/D - 38123 Povo Trento (Italy)
>      e-mail: giuseppe.cossu at create-net.org
>     <mailto:giuseppe.cossu at create-net.org>
>      Tel: (+39) 0461312428 <tel:%28%2B39%29%C2%A00461312428>
>     www.create-net.org <http://www.create-net.org>
>      --------------------------------------------------------
>
>
>      _______________________________________________
>     Fiware-lab-federation-nodes mailing list
>     Fiware-lab-federation-nodes at lists.fiware.org
>     <mailto:Fiware-lab-federation-nodes at lists.fiware.org>
>     https://lists.fiware.org/listinfo/fiware-lab-federation-nodes
>
>
>
>     ----------------
>
>      Este mensaje y sus adjuntos se dirigen exclusivamente a su
>     destinatario, puede contener información privilegiada o
>     confidencial y es para uso exclusivo de la persona o entidad de
>     destino. Si no es usted. el destinatario indicado, queda
>     notificado de que la  lectura, utilización, divulgación y/o copia
>     sin autorización puede estar prohibida en virtud de la legislación
>     vigente. Si ha recibido este mensaje por error, le rogamos que nos
>     lo comunique inmediatamente por esta misma vía y proceda a su
>     destrucción.
>
>      The information contained in this transmission is privileged and
>     confidential information intended only for the use of the
>     individual or entity named above. If the reader of this message is
>     not the intended recipient, you are hereby notified that any
>     dissemination,  distribution or copying of this communication is
>     strictly prohibited. If you have received this transmission in
>     error, do not read it. Please immediately reply to the sender that
>     you have received this communication in error and then delete it.
>
>      Esta mensagem e seus anexos se dirigem exclusivamente ao seu
>     destinatário, pode conter informação privilegiada ou confidencial
>     e é para uso exclusivo da pessoa ou entidade de destino. Se não é
>     vossa senhoria o destinatário indicado, fica notificado de que a
>     leitura, utilização, divulgação e/ou cópia sem autorização pode
>     estar proibida em virtude da legislação vigente. Se recebeu esta
>     mensagem por erro, rogamos-lhe que nos o comunique imediatamente
>     por esta mesma via e proceda a sua destruição
>
>
>
>
>      --
>
>
>
>
>     --------------------------------------------------------
>      Giuseppe Cossu
>      CREATE-NET
>      Smart Infrastructures
>      Research Engineer
>      Via alla Cascata 56/D - 38123 Povo Trento (Italy)
>      e-mail: giuseppe.cossu at create-net.org
>     <mailto:giuseppe.cossu at create-net.org>
>      Tel: (+39) 0461312428 <tel:%28%2B39%29%C2%A00461312428>
>     www.create-net.org <http://www.create-net.org>
>      --------------------------------------------------------
>
>
>
>     ----------------
>
>      Este mensaje y sus adjuntos se dirigen exclusivamente a su
>     destinatario, puede contener información privilegiada o
>     confidencial y es para uso exclusivo de la persona o entidad de
>     destino. Si no es usted. el destinatario indicado, queda
>     notificado de que la  lectura, utilización, divulgación y/o copia
>     sin autorización puede estar prohibida en virtud de la legislación
>     vigente. Si ha recibido este mensaje por error, le rogamos que nos
>     lo comunique inmediatamente por esta misma vía y proceda a su
>     destrucción.
>
>      The information contained in this transmission is privileged and
>     confidential information intended only for the use of the
>     individual or entity named above. If the reader of this message is
>     not the intended recipient, you are hereby notified that any
>     dissemination,  distribution or copying of this communication is
>     strictly prohibited. If you have received this transmission in
>     error, do not read it. Please immediately reply to the sender that
>     you have received this communication in error and then delete it.
>
>      Esta mensagem e seus anexos se dirigem exclusivamente ao seu
>     destinatário, pode conter informação privilegiada ou confidencial
>     e é para uso exclusivo da pessoa ou entidade de destino. Se não é
>     vossa senhoria o destinatário indicado, fica notificado de que a
>     leitura, utilização, divulgação e/ou cópia sem autorização pode
>     estar proibida em virtude da legislação vigente. Se recebeu esta
>     mensagem por erro, rogamos-lhe que nos o comunique imediatamente
>     por esta mesma via e proceda a sua destruição
>
>
>     _______________________________________________
>     Fiware-lab-federation-nodes mailing list
>     Fiware-lab-federation-nodes at lists.fiware.org
>     <mailto:Fiware-lab-federation-nodes at lists.fiware.org>
>     https://lists.fiware.org/listinfo/fiware-lab-federation-nodes
>
>
>     Αποποίηση ευθυνών / Disclaimer
>
>
>
>
> _______________________________________________
> Fiware-lab-federation-nodes mailing list
> Fiware-lab-federation-nodes at lists.fiware.org
> https://lists.fiware.org/listinfo/fiware-lab-federation-nodes



Αποποίηση ευθυνών / Disclaimer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.fiware.org/private/fiware-lab-federation-nodes/attachments/20151119/17472f88/attachment.html>


More information about the Fiware-lab-federation-nodes mailing list

You can get more information about our cookies and privacy policies clicking on the following links: Privacy policy   Cookies policy