Welcome to NexusFi: the best trading community on the planet, with over 150,000 members Sign Up Now for Free
Genuine reviews from real traders, not fake reviews from stealth vendors
Quality education from leading professional traders
We are a friendly, helpful, and positive community
We do not tolerate rude behavior, trolling, or vendors advertising in posts
We are here to help, just let us know what you need
You'll need to register in order to view the content of the threads and start contributing to our community. It's free for basic access, or support us by becoming an Elite Member -- see if you qualify for a discount below.
-- Big Mike, Site Administrator
(If you already have an account, login at the top of the page)
So I believe I found the underlying issue, and have begun a debug log recording events when that happens. I'll be trying to solve it in the coming days as I have inspiration on how to solve it exactly...
It has to done with our load balancer MaxScale not distributing the SQL connection to the next available node after a timeout or error (which happened due to the OOM event), then a race condition that causes all SQL connections to be maxed out, which leads to no connections being available -- and the site going down.
Spent the entire day since my last post working on this. Have made a few changes to how I am doing things, resulting in freeing up 16GB of ram on the server that keeps halting due to OOM.
There will be no difference on the user experience. I just realized I didn't need to allocate a cache the way I was doing it previously. For logged in users, zero change. For guest users & bots, there will be a tiny change but not much (I'll measure it over the next few days using Google Webmasters).
Not to be a pain in the ass but would it be possible for you to rearrange the top of the home page so it is some what organized/ symmetrical, my OCD is going nuts looking at it.
-P
"Truth is not what you want it to be; it is what it is, and you must bend to its power or live a lie"-Miyamoto Musashi
Please post a screenshot of what it looks like to you. I am trying new things for the homepage (yesterday), but formatting is difficult for a variety of screen sizes.
Everything back online. Apologies for extended downtime today. I have made some notes on things that went wrong, so I can be better prepared against them next time. This has never happened before in 10 years... was quite the adventure.
Glad you're on this. I never had the job of keeping a complex site up, and I'm glad I never did.
As to this never happening in 10 years, things never get simpler, do they? More technical progress means more complexity, and complexity always means more problems. As does making more changes to improve things.
Which is good for job security if you work for someone else, but is less so if you're the one in charge.... Especially if you keep trying to improve things, which means more changes and more ways for things to go wrong.
So I'm glad you're on top of this.
Bob.
When one door closes, another opens.
-- Cervantes, Don Quixote