[Fiware-lab-federation-nodes] [CESNET #134122] Re: experiences with HA

Sean Murphy sean at gopaddy.ch
Fri Nov 20 06:42:42 CET 2015


Hi all,

Thanks for all the inputs - I think the picture is becoming clearer (to me
at least).

So, iiuc, there are basically 3 options for Kilo/HA:
1. 'Legacy' router with pacemaker/corosync for failover
2. 'Legacy' router with VRRP for failover
3. DVR routers

The project has some experience with 1 and it has been shown to work
quite well. @Spain - how much has this been tested in larger scale
environments?

2 has not been evaluated in the project as far as I can see, but there have
been other reports outside the project which indicate that it could be
reasonably stable.

3 should not be considered for production environments at this time
mostly based on the Spanish experience.

We will do a test deployment of Kilo/HA based on the above info and
provide a short report on our findings - it has taken us a little while to
find some more servers to meet the increased requirements for controllers
in the HA context.

BR,
Seán.


On Thu, Nov 19, 2015 at 8:01 AM, José Ignacio Carretero <
joseignacio.carreteroguarde at telefonica.com> wrote:

> Regarding to the Spanish node, configurations for DVR were made in Juno.
> The issues rose using the distributed routers. As we had routing problems
> using Distributed routers, we decided create legacy routers, but the
> installation is prepared for distributed routers. --- Legacy or distributed
> is only a quality of the entity router.
>
> I don't think there is confusion at all with L3 agent and DVR at all.
> "Neutron" uses routers, these can be Legacy (as they ever have been) or
> Distributed (DVR). The L3 agent can be configured to use both.
>
> * DRV is partically incompatible with HA. Only SNAT could be configured HA.
>
> There is another 3rd configuration for L3 agent which may be worth to look
> at:  L3-HA-VRRP
>
> Regards,
> José Ignacio
>
> El 18/11/15 a las 22:08, Theofanis Katsiaounis escribió:
>
> So to sum up:
> Spain has deployed HA had issues and reverted to single controller in
> Juno. In Kilo they have deployed HA with DVR but they had issues and they
> reverted to legacy routers (which of course cancels "pure" HA).
> Giuseppe has deployed Kilo with HA (& DVR???) in a lab only environment
> and it seems stable. Is the lab environment on real hardware or Virtual??
>
> I also think there is a confusion between DVR and the L3 agent. In my
> opinion an L3 agent can be in HA without the routers being run as DVR. The
> case with this setup is that something like what happened to me (L3 agent
> failovered but did not "carry" the L3 router/namespace information with
> him) can easily happen again. DVR creates an active/standby scenario where
> if a node fails a router that resides on another node will just revert to
> Active state and keep on routing the traffic.
>
> I found loads of insightful and valuable information in this blog
> http://assafmuller.com/. I hope we can further this discussion since i
> think it is for the good of the project and it will eventually lead to
> better/more stable implementations.
>
> Best regards,
> Fanis
>
>
> *From: *José Ignacio Carretero
> <joseignacio.carreteroguarde at telefonica.com>
> <joseignacio.carreteroguarde at telefonica.com>
> *To: *Giuseppe Cossu <giuseppe.cossu at create-net.org>
> <giuseppe.cossu at create-net.org>
> *Cc: *"fiware-lab-federation-nodes at lists.fiware.org"
> <fiware-lab-federation-nodes at lists.fiware.org>
> <fiware-lab-federation-nodes at lists.fiware.org>
> <fiware-lab-federation-nodes at lists.fiware.org>, Cristian Cristelotti
> <cristian.cristelotti.coll at trentinonetwork.it>
> <cristian.cristelotti.coll at trentinonetwork.it>
> *Sent: *18/11/2015 12:00 PM
> *Subject: *Re: [Fiware-lab-federation-nodes] [CESNET #134122] Re:
> experiences with HA
>
> The problem with legacy routers is HA.
>
> Regards,
> José Ignacio.
>
> El 18/11/15 a las 10:58, Giuseppe Cossu escribió:
>
> Jose',
> indeed the official OpenStack documentation reports that "the Kilo release
> increases stability and reliability of DVR considerably over the Juno
> release".
>
> Anyway as you reported if the legacy routers are stable, I don't see any
> problems using them.
>
> Thanks for your feedback.
>
> Regards,
> Giuseppe
>
> On Wed, Nov 18, 2015 at 10:03 AM, José Ignacio Carretero <
> joseignacio.carreteroguarde at telefonica.com> wrote:
>
> Hi,
>
> That was what we thought: DVR seemed to be a good solution for HA, and
> this way we configured Spanish node. The fact is that it didn't work and we
> had so many problems with DVR. I really don't think this technology is
> mature yet.
>
> Spain2 node is configured to use DVR routers, however we're actually using
> Legacy routers only because Distributed routers were instable.
>
> Regards,
> José Ignacio.
>
> El 17/11/15 a las 14:25, Giuseppe Cossu escribió:
>
> Hi all,
> I want to share with you this link that lists the deployment scenario of
> Neutron: http://docs.openstack.org/networking-guide/deploy.html
> As I said the main problems using HA in OpenStack were related to Neutron,
> that's because the L3 agent was configured in active/passive and it was
> actually not ready to be really in HA. For that reason the OpenStack
> community has developed the DVR (introduced on Juno) that - on paper -
> solves many issues related to Neutron. For sure it overcomes many Neutron
> architecture limitation (performance, scalability, bottleneck of the
> networking node).
>
> I can confirm from my direct experience that Juno with legacy L3 agent is
> quite stable in a production environment.
> Regarding Kilo I would suggest to use DVR - but - as Fanis stated, there
> could be some unexpected issues... so it is up the the IOwner select the
> wise thing to do.
>
> NOTE: using Fuel 7.0 you don't have the possibility to choose between
> with-HA/without-HA. It deploys an HA environment, so using FUEL you have to
> manage the Corosync/Pacemaker cluster. That means that also Neutron is
> installed in HA.
> FUEL 7.0 have an additional option regarding the Neutron installation: you
> can choose to use or not DVR (if you not select DVR, the legacy L3 agent is
> used).
>
> Regarding the OpenStack architecture and procedures using HA, Mirantis
> offers a very useful documentation
> https://docs.mirantis.com/openstack/fuel/fuel-7.0/#guides . In particular
> regarding the HA:
> <https://docs.mirantis.com/openstack/fuel/fuel-7.0/operations.html>
> https://docs.mirantis.com/openstack/fuel/fuel-7.0/operations.html and
> https://docs.mirantis.com/openstack/fuel/fuel-7.0/reference-architecture.html#multi-node-with-ha-deployment
>
> Regards,
> Giuseppe
>
>
> On Tue, Nov 17, 2015 at 1:17 PM, Sean Murphy <murp at zhaw.ch> wrote:
>
> Hi again all,
>
> To follow up on this after the discussion on the confcall this morning
> (which
> I found v useful - it might be good if we have more discussion of these
> important issues on the calls from time to time).
>
> It was not clear to me the status of the Spanish node: I did not concretely
> understand what Fernando said regarding HA. From previous communication,
> I understand that they chose not to use HA in Juno; in the meetings of the
> minutes from today, I see
>
> "Migrated to Kilo, pending swift migration (waiting help from IBM)"
>
> @Fernando - can you tell us if you went with HA in Kilo?
>
> BR,
> Seán.
>
>
> On Mon, Nov 16, 2015 at 9:27 AM, Murphy Seán (murp) <murp at zhaw.ch> wrote:
>
> Hi Fede, all,
>
> juno HA is quite stable in our experience. the problems are always related
> to the neutron when you restart a
>
>
> Good to hear.
>
>
> node. so rule number one, if you need to restart, use corosynch to call
> out your node. this will do a graceful re-balancing among l3 agents. in
> case of sudden "death" of the node, the problem is not much in that, but
> when you re-attach the node. also in this case correct management of
> corosynch is the trick.
>
>
> Thanks for the pointers - I may ask for more info on the confcall as I
> don't fully
> get the point here. Also, it would be good to know if this also applies to
> Kilo.
>
>
> In case you have not noticed, following the new dow in FI-CORE and the
> Open Call, requirements on SLA and availability are quite strict, so if
> your node dies because the only controller you have is un-recoverable, and
> because of that you breach the required availability threshold, this may
> have financial implications for FI-CORE nodes.
>
>
> Thanks for pointing that out. I guess everyone has a strong interest in
> having the
> systems as reliable as possible - unreliable systems give lots of
> headaches. I guess
> what I was interested in knowing is whether HA is likely to make the
> system more
> reliable or less reliable: the experience in XiFi was that it seemed to
> make things
> less reliable.
>
> BR,
> Seán.
>
>
>
> Br,
> Federico
>
> --
> Future Internet is closer than you think!
> http://www.fiware.org
>
> Official Mirantis partner for OpenStack Training
> https://www.create-net.org/community/openstack-training
>
> --
> Dr. Federico M. Facca
>
> CREATE-NET
> Via alla Cascata 56/D
> 38123 Povo Trento (Italy)
>
> P  +39 0461 312471
> M +39 334 6049758
> E  federico.facca at create-net.org
> T @chicco785
> W  www.create-net.org
>
> On Fri, Nov 13, 2015 at 11:54 AM, Theofanis Katsiaounis <
> th_katsiaounis at neuropublic.gr> wrote:
>
> Hi all,
> Indeed Kilo could solve the network issues since networking is HA
> capable too.
> Containers/Swift can be a problem especially since you have to leave
> space to create the storage rings etc.
>
> Regards,
> Fanis
>
> On 13/11/2015 12:50 μμ, Cristian Cristelotti wrote:
> > Hi Sean,
> >
> > Our experience with Grizzly (HA) was very bad. IceHouse (HA) was better
> but not stable . Now we are with JUNO on single-node and we haven't faced
> any problem .
> > We are working on the migration to KILO (HA + murano + ceilometer ).
> >
> > KILO seems to have solved the problems mentioned by Fanis.
> > If you'll not deploy the node with HA you'll not have containers
> functionality or better you have to install swift manually after fuel
> deployment.
> >
> >
> >
> > Regards
> >
> > Cristian
> >
> > ----- Messaggio originale -----
> > Da: "Sean Murphy" <murp at zhaw.ch>
> > A: "Theofanis Katsiaounis" <th_katsiaounis at neuropublic.gr>
> > Cc: fiware-lab-federation-nodes at lists.fiware.org
> > Inviato: Venerdì, 13 novembre 2015 11:40:13
> > Oggetto: Re: [Fiware-lab-federation-nodes] [CESNET #134122] Re:
> experiences with HA
> >
> >
> >
> > Hi all,
> >
> >
> > So the feedback so far is the following:
> > - Riwal says that running Juno/HA is not so problematic, but has not had
> a specific failure
> > situation where HA could really be tested
> > - Fernando notes that Juno/HA exhibited stability problems for larger
> numbers of users and
> > decided against it
> > - Fanis notes that Icehouse/HA was quite problematic in multiple
> respects
> >
> >
> > >From our pov, this is not painting a v positive picture regarding HA
> and despite
> > our inclination to experiment with newer technologies we would prob opt
> not to
> > use HA.
> >
> >
> > Does anyone in the project have Kilo/HA experience?
> >
> >
> > BR,
> > Seán.
> >
> >
> >
> >
> >
> >
> > On Fri, Nov 13, 2015 at 10:38 AM, Theofanis Katsiaounis < th_katsiaounis@
> neuropublic.gr > wrote:
> >
> >
> >
> > Hi all,
> > we had HA on Icehouse and it was a mess. Especially with the
> Networking/Neutron part. Namespaces were not transfered between nodes so if
> one went down vm's lost networking. Reboots were a lottery indeed,
> sometimes they worked sometimes they did not. And when we lost power once i
> had to rebuild the node.
> > Of course the FIWARE lab handbook asks for an HA solution but i see in
> the case of Spain this has already been violated ;).
> > My two cents is that the guys from Spain made the right choice. I do not
> think HA in openstack is ready for production especially with a big number
> of users.
> >
> > Regards,
> > Fanis
> >
> >
> > On 13/11/2015 11:33 πμ, Riwal KERHERVE wrote:
> >
> >
> >
> >
> >
> > Sean,
> >
> >
> >
> > In Grizzly, anytime we needed to restart processes handled by CRM, it
> was a lottery. Sometimes, everything went fine and sometimes the processes
> keep on rebooting and it take us hours to put back things in order.
> >
> > In Juno, we never experienced this kind of behavior. When we needed to
> restart processes trough CRM, all always went fine.
> >
> >
> >
> > To answer to your question:
> >
> > The only time, we played with HA, it was to take into account some
> modification in our configuration files. I do not recall exercising HA
> capabilities, like the need of putting one node down and switching all
> processes to the other node.
> >
> >
> >
> > BR
> >
> > Riwal
> >
> >
> >
> > De : sean at gopaddy.ch [ mailto:sean at gopaddy.ch ] De la part de Sean
> Murphy
> > Envoyé : jeudi 12 novembre 2015 17:01
> > À : Riwal KERHERVE
> > Cc : fiware-lab-federation-nodes at lists.fiware.org
> > Objet : Re: [CESNET #134122] Re: [Fiware-lab-federation-nodes]
> experiences with HA
> >
> >
> >
> >
> >
> >
> > Hi Riwal,
> >
> >
> >
> >
> >
> > Good feedback - thanks for that.
> >
> >
> >
> >
> >
> > As a matter of interest, have you ever needed to exercise any of the HA
> >
> >
> > capabilities or have you tested it in anger?
> >
> >
> >
> >
> >
> > BR,
> >
> >
> > Seán.
> >
> >
> >
> >
> >
> > On Thu, Nov 12, 2015 at 4:51 PM, Riwal KERHERVE via RT < xifi-support@
> rt.cesnet.cz > wrote:
> >
> > Sean,
> >
> > I do not have experience with Kilo in HA, but our node is in Juno and in
> HA. We installed it with fuel 6.0 (2 controllers and 1 Arbitrator)
> . We never have any trouble until now: very stable, nothing to be with HA
> in grizzly.
> >
> > BR
> > Riwal
> >
> > De : fiware-lab-federation-nodes-bounces at lists.fiware.org [mailto:
> fiware-lab-federation-nodes-bounces at lists.fiware.org ] De la part de Sean
> Murphy
> > Envoyé : jeudi 12 novembre 2015 16:33
> > À : fiware-lab-federation-nodes at lists.fiware.org
> > Objet : [Fiware-lab-federation-nodes] experiences with HA
> >
> >
> >
> >
> > Hi all,
> >
> > We're looking at our upgrade strategy and we're curious to
> > hear any experience with Kilo HA both from the deployment
> > perspective as well as the operations perspective.
> >
> > >From xifi, I remember Fanis reporting a split-brain scenario
> > with HA and in the end he opted not to go with a HA solution;
> > this gives me pause for thought when considering this
> > deployment solution, even though it seems to be the
> > preferred solution.
> >
> > Generally, we would be well disposed to a HA deployment
> > as we would like to learn about it, but we do not want to
> > end up deploying a technology that is too far from production
> > readiness.
> >
> > Does anyone have any experience that they can share on this
> > point?
> >
> > BR,
> > Seán.
> >
> >
> >
> >
> >
> > _______________________________________________
> > Fiware-lab-federation-nodes mailing list Fiware-lab-federation-nodes@
> lists.fiware.org
> https://lists.fiware.org/listinfo/fiware-lab-federation-nodes
> >
> >
> > Αποποίηση ευθυνών / Disclaimer
> >
> >
> > _______________________________________________
> > Fiware-lab-federation-nodes mailing list
> > Fiware-lab-federation-nodes at lists.fiware.org
> > <https://lists.fiware.org/listinfo/fiware-lab-federation-nodes>
> https://lists.fiware.org/listinfo/fiware-lab-federation-nodes
> >
>
>
>
> *Αποποίηση ευθυνών / Disclaimer* <http://www.neuropublic.gr/el/disclaimer>
>
> _______________________________________________
> Fiware-lab-federation-nodes mailing list
> Fiware-lab-federation-nodes at lists.fiware.org
> https://lists.fiware.org/listinfo/fiware-lab-federation-nodes
>
>
>
>
>
> _______________________________________________
> Fiware-lab-federation-nodes mailing list
> Fiware-lab-federation-nodes at lists.fiware.org
> https://lists.fiware.org/listinfo/fiware-lab-federation-nodes
>
>
>
>
> --
> --------------------------------------------------------
> Giuseppe Cossu
> CREATE-NET
> Smart Infrastructures
> Research Engineer
> Via alla Cascata 56/D - 38123 Povo Trento (Italy)
> e-mail: giuseppe.cossu at create-net.org
> Tel: (+39) 0461312428
> www.create-net.org
> --------------------------------------------------------
>
>
> _______________________________________________
> Fiware-lab-federation-nodes mailing listFiware-lab-federation-nodes at lists.fiware.orghttps://lists.fiware.org/listinfo/fiware-lab-federation-nodes
>
>
>
> ------------------------------
>
> Este mensaje y sus adjuntos se dirigen exclusivamente a su destinatario,
> puede contener información privilegiada o confidencial y es para uso
> exclusivo de la persona o entidad de destino. Si no es usted. el
> destinatario indicado, queda notificado de que la lectura, utilización,
> divulgación y/o copia sin autorización puede estar prohibida en virtud de
> la legislación vigente. Si ha recibido este mensaje por error, le rogamos
> que nos lo comunique inmediatamente por esta misma vía y proceda a su
> destrucción.
>
> The information contained in this transmission is privileged and
> confidential information intended only for the use of the individual or
> entity named above. If the reader of this message is not the intended
> recipient, you are hereby notified that any dissemination, distribution or
> copying of this communication is strictly prohibited. If you have received
> this transmission in error, do not read it. Please immediately reply to the
> sender that you have received this communication in error and then delete
> it.
>
> Esta mensagem e seus anexos se dirigem exclusivamente ao seu destinatário,
> pode conter informação privilegiada ou confidencial e é para uso exclusivo
> da pessoa ou entidade de destino. Se não é vossa senhoria o destinatário
> indicado, fica notificado de que a leitura, utilização, divulgação e/ou
> cópia sem autorização pode estar proibida em virtude da legislação vigente.
> Se recebeu esta mensagem por erro, rogamos-lhe que nos o comunique
> imediatamente por esta mesma via e proceda a sua destruição
>
>
>
>
> --
> --------------------------------------------------------
> Giuseppe Cossu
> CREATE-NET
> Smart Infrastructures
> Research Engineer
> Via alla Cascata 56/D - 38123 Povo Trento (Italy)
> e-mail: giuseppe.cossu at create-net.org
> Tel: (+39) 0461312428
> www.create-net.org
> --------------------------------------------------------
>
>
>
> ------------------------------
>
> Este mensaje y sus adjuntos se dirigen exclusivamente a su destinatario,
> puede contener información privilegiada o confidencial y es para uso
> exclusivo de la persona o entidad de destino. Si no es usted. el
> destinatario indicado, queda notificado de que la lectura, utilización,
> divulgación y/o copia sin autorización puede estar prohibida en virtud de
> la legislación vigente. Si ha recibido este mensaje por error, le rogamos
> que nos lo comunique inmediatamente por esta misma vía y proceda a su
> destrucción.
>
> The information contained in this transmission is privileged and
> confidential information intended only for the use of the individual or
> entity named above. If the reader of this message is not the intended
> recipient, you are hereby notified that any dissemination, distribution or
> copying of this communication is strictly prohibited. If you have received
> this transmission in error, do not read it. Please immediately reply to the
> sender that you have received this communication in error and then delete
> it.
>
> Esta mensagem e seus anexos se dirigem exclusivamente ao seu destinatário,
> pode conter informação privilegiada ou confidencial e é para uso exclusivo
> da pessoa ou entidade de destino. Se não é vossa senhoria o destinatário
> indicado, fica notificado de que a leitura, utilização, divulgação e/ou
> cópia sem autorização pode estar proibida em virtude da legislação vigente.
> Se recebeu esta mensagem por erro, rogamos-lhe que nos o comunique
> imediatamente por esta mesma via e proceda a sua destruição
>
>
> _______________________________________________
> Fiware-lab-federation-nodes mailing list
> Fiware-lab-federation-nodes at lists.fiware.org
> https://lists.fiware.org/listinfo/fiware-lab-federation-nodes
>
>
>
> *Αποποίηση ευθυνών / Disclaimer* <http://www.neuropublic.gr/el/disclaimer>
>
>
>
> ------------------------------
>
> Este mensaje y sus adjuntos se dirigen exclusivamente a su destinatario,
> puede contener información privilegiada o confidencial y es para uso
> exclusivo de la persona o entidad de destino. Si no es usted. el
> destinatario indicado, queda notificado de que la lectura, utilización,
> divulgación y/o copia sin autorización puede estar prohibida en virtud de
> la legislación vigente. Si ha recibido este mensaje por error, le rogamos
> que nos lo comunique inmediatamente por esta misma vía y proceda a su
> destrucción.
>
> The information contained in this transmission is privileged and
> confidential information intended only for the use of the individual or
> entity named above. If the reader of this message is not the intended
> recipient, you are hereby notified that any dissemination, distribution or
> copying of this communication is strictly prohibited. If you have received
> this transmission in error, do not read it. Please immediately reply to the
> sender that you have received this communication in error and then delete
> it.
>
> Esta mensagem e seus anexos se dirigem exclusivamente ao seu destinatário,
> pode conter informação privilegiada ou confidencial e é para uso exclusivo
> da pessoa ou entidade de destino. Se não é vossa senhoria o destinatário
> indicado, fica notificado de que a leitura, utilização, divulgação e/ou
> cópia sem autorização pode estar proibida em virtude da legislação vigente.
> Se recebeu esta mensagem por erro, rogamos-lhe que nos o comunique
> imediatamente por esta mesma via e proceda a sua destruição
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.fiware.org/private/fiware-lab-federation-nodes/attachments/20151120/e209631d/attachment.html>


More information about the Fiware-lab-federation-nodes mailing list

You can get more information about our cookies and privacy policies clicking on the following links: Privacy policy   Cookies policy