Outages (Boo!)

No one is a fan of outages, especially me! I do my best to avoid them if I can, and much of the time it’s smooth sailing – but recently as customers on Koinonia and Praise have noticed, it’s not been a good few days! Outage, then back online, then down – currently still working on the on-going issue with Praise – but as some aren’t big users of Twitter or Facebook I wanted to post a big update as I sit and wait on the Datacenter.

Koinonia had a big outage lasting over 7 hours after some datacenter issues, they have assured me it was an “isolated” issue, and is resolved, but it doesn’t bolster confidence that days later the similar issue hit Praise – and Praise is still as of this writing offline, they are working to get things back online – but they are using the words now “RESTORE” from a previous backup – so I am patiently waiting to see where we stand as backups are kept, but rolling back to a old version always runs the risk you don’t always have your latest posts/files/emails – and that’s not my desire – but if data corruption did occur, something is better than 100% loss, but still not what I’d desire.

This is not something I am going to stand for though, I need each and every customer to understand that I too agree this is unacceptable, and I am working on a migration plan to another datacenter as the track record in recent months is not one I feel reflects the quality I strive to provide with FlockHosting – so I am working with another company I’ve done business with over the years to deploy new hardware and once things are back online schedule a migration to new servers.

I know some have told me “It’s okay Chuck, this happens!” and yes, it does. But I strive to be a good stewart of what God’s given me, and that’s all of you as customers – I desire for the most uptime, fewest problems, and the best deal for your dollar with services here! I am very saddened that it’s hit Church Hosting & Praise which is a handful of clients who’ve been with FlockHosting for over 10 years, so I appreciate all of your patience, but I want to make sure everyone is stable, safe, and secure – and I can’t turn a blind eye to the recent events which included a server belonging to a customer too having a multi-hour outage.

So once this calms down today (I’ll post some updates to the end of this post with more info as it comes along regarding Praise) – I’ll begin making the plan, and by the mid-month FlockUpdate I’ll have a time-table and what to expect. So please be on the look out for that – I’ll email blast & usual social networks about it too.

Again my deepest apologies, I hope things are completely stable here shortly – actually just had remote monitoring text that things are “UP” so I’ll begin reviewing and touching base with the L3 tech support team and see where we stand! I’ll post updates in this post shortly!

Update (1:08PM): Services are once again responding and looking over all services – data appears to be possibly as old as the 3rd (4 days old) trying to get confirmation on this as I continue to verify services. As soon as I have more I’ll update this post with additional information.

Update (1:29PM): And were down again 🙁 I’m very sorry for this extended outage everyone, hoping to get things squared away ASAP talking to the DC to see what happened and why this KEEPS happening. Prayers are appreciated!

Update (2:07PM): Still awaiting news from the datacenter, they are working on the problem but no updates but I see things in motion, I’m keeping an eye on things and wish I had more to report, but I wanted to let you guys know things are still being worked on!

Update (2:29PM): Things are back online again, still waiting for final word from DC guys about where we stand – thanks again to everyone for your patience in all of this, I’ll be keeping an eye on things!

Update (6:30PM – Post Nap): Just got word from the datacenter on the date of the data used to restore the system, they used an image from “March 04, 2013 00:33” – other images showed damage – so you may have some files added since the 4th missing, and I apologize for that, but we have a fully working system, no user reported issues, general audit by me shows things working A-OK. If you notice anything out of the ordinary, please open a trouble ticket ASAP!

Update (8:05PM): Got a report that things were funky, and appears they were indeed – server appears to of run out of memory it seems (not confirmed but familiar feel) but rebooting the server, seeing if we can’t get things stable – if these persist, going to pull an all nighter and customers will be relocated to Faith, Solomon, and Shepherd over the course of the evening as I can migrate your data off – I do apologize again for these issues – hopefully we can wrap them up tonight one way or another.

Update (10:26PM): Sorry for the slow reply, waiting literally for a reply – datacenter reports there is again an issue with the storage system as the original problem that started this all 🙁 It is being “worked on” – I wish I had better news, but I have also been using this time to talk to another company about some options – thankfully the contact there is easier to get ahold of even after 10PM, so not wasting time but working on solutions to get things working smoothly for you all again. I hope to have some good news shortly! Hang in there all prayers are appreciated, going to be a long night!

Update (10:56PM): Latest update, the storage system is being replaced and data migrated, DC teams are pushing us to the front of the pack, so we’ll get migrated to the new system, and hopefully if that remains stable I’ll let things roll as-is for the evening -but- I will still continue to create a plan for migration – and be in touch with each of you regarding the move – but I do want you guys to be stable for a while if at all possible before shifting accounts all over the place. More updates to follow, thanks again to everyone’s patience and prayers!

Update (11:56PM): We got to the front of the line, still meant all our data being migrated to a new storage system, and things feel a lot more stable, but I’m not going to bed anytime soon, seeing how stable things are and monitoring for an hour or two, till I’m sure I’m not going to bed only to find out 5 minutes after I turned in things went sideways, and getting a copy of data moved off-site for ease of access if things get bad again. So in the mean time feel free to check out things too on your side – if you see anything odd, either add to a ticket you already have open or email me what’s up. Thanks You All Sooooo much for your patience and kindness thru all of this!

Update (1:15AM): Into the evening I go! I am currently packaging all accounts in cPanel’s format for migrating accounts between servers, load is low on the server currently but this process can slow things down some, but its in motion now, data will be off-network just in case anything funky happens again and I need to relocate accounts to another DC – hopefully this early hour won’t effect anyone negatively – so far it’s been stable and speedy, but I want to get this set moved over just in case I need to light it up elsewhere! Will be adjusting DNS too so if needed changes are quick! Lots to do, and times like this I consider giving coffee another try 🙂 More updates to follow!

Update (2:35AM): Only a half dozen more accounts to package and get moved over, moving along quite quickly which is nice in the realm of not being up the entire night, but too let’s me stress the current configuration and ensure its going to be ok! God is Good! I’m even multi-tasking and picking/practicing songs to play on Sunday 🙂 So making good use of the time as I monitor things! More updates soon!

Update (3:05AM): All accounts packaged and re-located for safe keeping, server has been quite stable – Currently 3 hours and 30 minutes of uptime! So going to leave things as-is for the evening, and I’ll draft up a official maintenance night – I don’t want to rush anything and too make there be additional downtime as sites propagate to another server – so I’ll continue to make the needed adjustments to get things ready to roll, I hope to have a plan up by Monday and we’ll make it happen! But again I want to say thank you to everyone for their patience and grace! It was not an easy day overall, and I’m sorry that the outage lasted so long! Praying for stability, but if needs be I’ll be awake and make the move happen sooner if needed.

Your continued prayers are appreciated! I am going to wrap up some final checks and head to bed and try to get a solid 8 hours if possible, but I’d settle for 6! 🙂

4 thoughts on “Outages (Boo!)

Comments are closed.