MarcEdit Update (all versions); post-mortem analysis of the marcedit.reeset.net website (continued) downtime

*** Update: as of 11:10 AM EST, 3/6/2017 — it appears all resources are back online and available.  Much thanks to Darryl, a helpful support agent that found whatever magic setting that was keeping the DNS from refreshing. ***

QjYMz8o

What a long weekend this has been.  If you are a MarcEdit user, you are probably aware that since late Thursday (around 11 pm EST), the MarcEdit website has shifted between being up, down, to currently (as of writing) completely disappeared.  I have Bluehost to thank for that; though more on that below.  More importantly, if you are a MarcEdit user, you may have found yourself having problems with the application.  On Windows, if you have plugins installed, you will find that the program starts, and then quits.  On a Mac, if you have automatic updates enabled or plugins installed, you see the same thing…the program starts, and then quits.  I’ve received a lot of email about this (+2k as of right now), and have answered a number of questions individually, via the listserv (on Friday) and via Twitter — but generally, this whole past couple of days has been a real pain in the ass.  So, let’s start with the most important thing that folks want to know — how do you get past the crashing.  Well, you have a couple of options.

Update MarcEdit

I know, I know — how can you update MarcEdit when the MarcEdit website isn’t resolving.  Well, the website is there — what isn’t resolving is the DNS (more on why below).  The MarcEdit website is a subdomain on my primary domain, reeset.net.  It points to a folder on that domain, and that folder is still there, and can be accessed directly via the direct path to the file (rather than though the dedicated dns entry).  So, where can you get them?  Directly, the links to the files are as follows:

You can see the change logs for each of these versions, if you access the links directly:

My hope, is that sometime between 12 am — 8 am 3/6/2017; the DNS entry will reset and marcedit.reeset.net will come back to life.  If it does, the problems folks are having with MarcEdit will resolve themselves, users will be prompted to download the referred updates above, and I can put this unpleasantness behind me.  However, I will admit that my confidence in my webhost is a little shaken, so I’m not confident that everything will be back to normal by morning — and if it isn’t; the links above should get users to the update that will correct the issues that we are seeing.

Disabling AutoChecking

If you are in a position where you cannot update MarcEdit using one of the links above, and the website has not come back up yet — you will need to take a couple of steps to keep the automatic checks from running and causing the unhandled exception.  I’ve noted this on the listserv, but I’ll document the process here:

  • On Windows:
    The process that is causing the crash is the plugin autochecking code.  This code will only run if you have installed a plugin.  To disable autochecking, you will want to remove your plugins.  The easiest way to do that is to go to your User Application folder, find the plugins directory, and do one of two things: 1) delete your plugins (assuming you’ll just put them back later) or 2) rename the plugin folder to plugin.bak and create an empty plugins folder.  The next time you restart MarcEdit — it should function normally.  You will be able to re-enable your plugins when MarcEdit’s website comes back up; but the only permanent solution will be to update your version of MarcEdit.
  • On MacOS:
    The process that is causing the crash is the plugin autochecking and the application autochecking.  You will need to disable the parts that apply to you.  As with the Windows instructions — this is a temporary measure — the permanent fix would be to update MarcEdit using the website (if marcedit.reeset.net is live), or directly using one of the links above.
    Disable autoupdate: You will need to open the config.xml file in the application user directory.  This is found on a MacOs system at: /Users/your_user_name/marcedit/configs — open the config.xml file.  Find the <settings></settings> XML block.  You are looking for the <autoupdate> element.  If it’s not present, autoupdating is automatically enabled.  Add the following snipped into the <settings> block.  <autoupdate>0</autoupdate>  This will disable automatic update checking.
    Disable plugin checking: You will find the plugins folder (assuming you’ve downloaded a plugin) in the application user space.  This is found at: /Users/your_user_name/marcedit/plugins — as with the Windows instructions, either delete the contents of the folder, or rename the folder as plugins.bak and create a new empty folder in its place.  Doing those two steps will disable the automatic checking, and allow you to use MarcEdit normally.

So what the hell happened?

Well, there are two answers to this question — and it was the convergence of these two issues that caused the current unpleasantness.

On the MarcEdit-side

I really wish that I could say that this was all on my webhost, but I can’t.  MarcEdit is a large, complicated application, and in order to keep it running well, the tool has to do a lot of housekeeping and data validation.  To ensure that all these tasks don’t cause the application to grind to a halt — I thread these processes.  Generally, with each function that falls into their own separate thread, I include a try{…}catch{…}finally{…} block to handle exceptions that might popup.  Additionally, I have an exception handler that handles errors that fall outside of these blocks — they protect the program from crashing.  So, what happened here?  The problem is the threading.  By running these functions in their own threads, they fall outside of MarcEdit’s global exception handler.  This is why it is important that each of these thread blocks have their own exception handling, and that it be very good.  And, for whatever reason, I didn’t provide explicit handling of connection issues in the checkplugins code.  This is why the website being down has impacted the program.

On the Website side

I have used Bluehost as my website provider for years.  They have been a trusted partner and while I’ve had the occasional blip, this is the first time that I can honest say that I’ve felt like they’ve completely dropped the ball.  The problem here, started with a planned server update.  Bluehost notified me that there would be a brief period of inconsistent access while they updated some server components.  This is routine stuff, and should have amounted to no more than a few minutes of downtime.  I host my website on their cloud infrastructure, which is distributed across multiple nodes — I figured no big deal, this is exactly what a cloud infrastructure was designed for — if they have a problem at one data center — you just enable access through another one.  Well, that’s not how it happened.  When I got up on Friday, there was an ominous, and cryptic message from Bluehost letting me that there was some trouble with the update, and they would be restoring access to service throughout the day.  Why did this affect all the nodes?  What actually happened?  Who knows.  All I know is that all Friday, all content was inaccessible and contacting Bluehost was impossible.  When things didn’t come back up by 7 pm EST, I stayed on hold for 4 hours before giving up and hoping things would look better in the morning.

Saturday morning…things didn’t look better.  Access to the administrative dashboard had been restored, but the database server was inaccessible and all subdomains were broken.  I finally got ahold of support around noon, after waiting on hold for another 3 hours — and they filed a ticket, and asked me to check back in an hour because they were going to reset the DNS.  By Saturday evening, the database server was alive again, my second hosted domain was at least visible, but the other subdomains still didn’t work.  I waited on hold again for another 2 hours, and got a hold of another person from support.  This time, we talked about what I was seeing in my DNS record and the Subdomain editing tools.  Things weren’t right.  They said they would file another ticket, I thanked them, and went to bed.

Sunday morning — things are moving in the right direction.  All content on the reeset.net domain is accessible, the blog is accessible, and the secondary domain mostly works.  My other subdomains have all disappeared.  I contacted support again — they re-established the subdomains, manually edited the dns record…and this is where we stand. At this point, I believe that once the DNS propagates, the last of my subdomains (include marcedit.reeset.net) will become live again.  At least that is my hope.  I’m really, really, really hoping that I don’t have to take this up with support again.

As I noted above, this is the first time that I can honestly say, I have been really unhappy with the way Bluehost has handled the outage.  These things happen (they shouldn’t), but I understand it.  I work in IT, I’ve been in the position that they are in now; but I find that the best course of action is to own it, and communicate with your customers proactively, and often.  What has disappointed me the most from Bluehost has been the blackhole of information.  Unless I contact support, they aren’t saying anything — and their front line support has no way to check the status of my tickets because they were escalated to their internal support units.  So, I’m basically just twittling my thumbs until someone finally decided to let me know what’s going on, or things just start working.

In Summary

So, to summarize — there is a fix, and you can access it at the links above or you can disable the offending code by following the instructions above.  When will the website be live, your guess is as good as mine at this point, but hopefully soon.  Long-term; I have a github account that I use to host code, and have setup a page at: http://reeset.github.io to host information about those projects.  Given the pain this has caused, I will likely invest some time in setting up a mirror of the marcedit.reeset.net domain over there; so that if something comes up, I users can still get access to the resources that they need to work.

If you have questions, feel free to let me know.

–tr


Posted

in

by

Tags: