[Backlogmanager] [FIWARE-JIRA] (HELP-15954) [fiware-askbot] I have a problem/question regarding the init procedure

José Ignacio Carretero Guarde (JIRA) jira-help-desk at jira.fiware.org
Mon Nov 4 11:50:00 CET 2019


     [ https://jira.fiware.org/browse/HELP-15954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

José Ignacio Carretero Guarde resolved HELP-15954.
--------------------------------------------------
    Resolution: Done

> [fiware-askbot] I have a problem/question regarding the init procedure
> ----------------------------------------------------------------------
>
>                 Key: HELP-15954
>                 URL: https://jira.fiware.org/browse/HELP-15954
>             Project: Help-Desk
>          Issue Type: Monitor
>            Reporter: Backlog Manager
>            Assignee: José Ignacio Carretero Guarde
>              Labels: fiware-orion, health-check
>
> Created question in FIWARE Q/A platform on 19-06-2019 at 21:06
> {color: red}Please, ANSWER this question AT{color} https://ask.fiware.org/question/1085/i-have-a-problemquestion-regarding-the-init-procedure/
> +Question:+
> I have a problem/question regarding the init procedure
> +Description:+
> Hello all,
> I'm trying to deploy the Orion ContextBroker on a Openshift/OKD(Kubernetes) cluster  and I'm having a problem with its deployment regarding the initialization time.
> I'm using the 2.2.0 release tag for the ContextBroker with mongoDB 3.2.0. The startup args for Orion are:
> "-ipv4 -reqPoolSize 100 -notificationMode threadpool:10000:50 -statNotifQueue -statCounters -statSemWait -statTiming -relogAlarms -httpTimeout 100000"
> The initialization appears to be somewhat inconsistent regarding the time required for the app to become available.
> Sometimes the deployment runs "smoothly" and sometimes the app fails to start (in a reasonable time interval).
> The initialization seems to freeze at a certain point which appears in the logs as [1]. The actual service isn't started (lsof -i -n -P doesn't return any processes using port 1026).
> I use standard health-checks which, basically, do a "curl localhost:1026/version". I've tried modifying the timeouts and also the delay time from which the probe fires. Not even with 360sec (6 minutes) delay do I reach consistent deployments!
> I've tested with different resource allocation and this doesn't seem to be the problem.
> Also by checking the logs I see some "odd" intervals in the initialization procedure. I have some excerpts at the end of the message, [2], where I can see the last steps of the init procedure being executed(or, at least, logged) at precisely 1 minute intervals.  
> The problem is that once the readiness health check fails, the deployment fails as well. Orion seems to use a lot of RAM which does not get released even if the notification load disappears, from what I saw. The recommendation would be to restart the process, which, in my case, can be automatically handled if I set an upper memory limit for the container. So the initialization process comes again in question...also for the auto-scaling mechanism, etc. 
> Any hints towards how to solve this problem would be much appreciated!
> Thanks,
> Dan
> ==========================
> [1] - last 10-12 lines from ContextBroker's log; DEBUG -t 0-255
> =============================
> time=Wednesday 19 Jun 16:50:35 2019.407Z | lvl=DEBUG | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=connectionOperations.cpp[802]:getWriteConcern | msg=getWriteConcern()
> time=Wednesday 19 Jun 16:50:35 2019.407Z | lvl=INFO | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=connectionOperations.cpp[807]:getWriteConcern | msg=Database Operation Successful (getWriteConcern)
> time=Wednesday 19 Jun 16:50:35 2019.407Z | lvl=DEBUG | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=mongoConnectionPool.cpp[240]:mongoConnect | msg=Active DB Write Concern mode: 1
> time=Wednesday 19 Jun 16:50:35 2019.431Z | lvl=DEBUG | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=connectionOperations.cpp[691]:runCollectionCommand | msg=runCommand() in 'admin' collection: '{ buildinfo: 1 }'
> time=Wednesday 19 Jun 16:50:35 ... (more)



--
This message was sent by Atlassian JIRA
(v6.4.1#64016)


More information about the Backlogmanager mailing list

You can get more information about our cookies and privacy policies clicking on the following links: Privacy policy   Cookies policy