read

I’ve had a few minor outages of this blog and other sites hosted by JustHost. I’m not particularly worried so long as they get to the bottom of it and stop it happening again.

My uptime over the last 12 months has been 99.97%, which I’m perfectly happy with.

But I’m a bit bemused by the attitude of their support people, who seem to be satisfied that the server is up when they look at it. No attempt to check if it has an internet connection, no intention to do root cause analysis, not even an admission that there is a problem.

I reproduce here my entire conversation with them over the last few days: there are some masterpieces of surreal logic. I sort of suspect they are all native English speakers who are trying to wind me up deliberately. If so, they succeeded.

Received on Mar/21/2011 10:29:11AM
Domain: noo.gs Pingdom and Altertra are reporting that my websites on cx52.justhost.com are unavailable.
Received on Mar/21/2011 10:32:13AM
OK, they're back. Would appreciate an explanation.
Sent to Dominic Sayers on Mar/21/2011 12:33:07PM
Hi Dominic, I've just checked and the website noo.gs works fine. Our server has been up for 66 days already. The causes could be different. If you still have the problem, please, provide us with the screen shot of the error. https://www.wikihow.com/Take-a-Screenshot-in-Microsoft-Windows Let us know if you need any help. -- Kind regards, Marcus Gervaise Just Host
Received on Mar/21/2011 12:57:58PM
Hi Marcus, As I said in my update several hours ago, I know the service is working again now. Two separate monitoring services (Pingdom and Alertra) notified me of the interruption from monitoring points worldwide. Attached is the detailed Pingdom log for the error. I'm looking for two things from you: (1) An acknowledgement of the service interruption. (2) An indication of your intention to do root cause analysis and inform me of the results I'm aware of your terms of service, but I'd like you to take this seriously. Your uptime figure is not a proof that connectivity has been continuously maintained. What does "The causes could be different" mean, please? The causes of what? Different from what? Regards, Dominic Sayers
Sent to Dominic Sayers on Mar/21/2011 2:01:56PM
Hi Dominic, There was no interruption on the server. One of the possible causes could be blocking of your WAN IP, because of to many parallel connections. I can't tell you for sure what causes the problem, because I don't see any error and you don't too, because as far as I understand you don't have any problem with the accessing your website. We can check only your current connection to our server. You may  check the hosting of your website via the following link: https://host-tracker.com/ Let us know if you need any help. -- Kind regards, Marcus Gervaise Just Host
Received on Mar/21/2011 5:19:40PM
Hello Marcus, "To [sic] many parallel connections" - what is the limit on parallel connections? Could you also point me to the clause in your Ts & Cs that limits the parallel connections I am allowed? Which of your Cpanel monitoring tools allows me to track the peaks in parallel connections? By the way, I am not unhappy with your service in general, just with your response to this incident. You do not seem to be taking seriously the demonstrable failure in the service I pay for. Looking at the total traffic across all my sites, this was another very low volume day. I think it unlikely that any parallel connections limit was breached. Sorry Marcus but you have so far come up with two possible explanations based on no evidence. Do I understand from this that the 24/7 network monitoring that is promised on your sales pages <https://www.justhost.com/guarantees#ic> amounts to nothing that is useful for diagnosing the root cause of my problem? Regards, Dominic Sayers
Sent to Dominic Sayers on Mar/21/2011 7:02:59PM
Dear  Dominic, Please take my apologies but, the maximum allowed quota of the connection to the server is 50. We use a couple of monitoring systems that use this limit as rule and if it is overloaded they just send warning messages. That action allows us to discover DDoS attacks. As you probably understand it would be too hard to block those IPs manually so we use our custom scripts and scripts that cPanel offers by default. Some of that IPs are blocked temporary. ( from 1 to 30 min) Each of those scripts monitor its own part of our system moreover they contacting each other in order to block the problematic part in the system and in the end they make the entire firewall system. As for this incident: I can confirm that cx52 was not down. Current uptime of your server is: 18:00:15 up 66 days, 11:56,  1 user,  load average: 2.76, 2.00, 1.94 Let me know if you have more questions. - - - - - - - Kind regards, Yates Stebljanko Just Host
Received on Mar/21/2011 7:06:35PM
Thanks for the quick reply. If the cause of the outage was not (1) connection quota or (2) server downtime then what do you think it was? Will it happen again? Regards, Dominic
Sent to Dominic Sayers on Mar/21/2011 8:11:10PM
Dear  Dominic, Unfortunately I can't say what the real problem was. I hope this never happen again. Sorry for caused inconveniences. Thank you - - - - - - - Kind regards, Yates Stebljanko
Received on Mar/22/2011 5:27:26AM
I hoped so too. Unfortunately the sites were unavailable again this morning from 04:33 to 05:44. Attached is the log from my monitoring service Pingdom. Can you say what the reason for this extended outage was?
Sent to Dominic Sayers on Mar/22/2011 6:06:43AM
HI Dominic, Sorry, but we were forced to restart our server that time (as it was a necessary kernel update there) thank you for the understanding. Please let me know if you need more help. -- Kind regards, Max Pinner Just Host
Received on Mar/22/2011 7:38:33AM
Why was it down for over an hour?
Sent to Dominic Sayers on Mar/22/2011 8:16:14AM
HI Dominic, It was not for an hour , it was just for 20-30 min from our end. It would be better to provide us with the traceroute result from your end. traceroute noo.gs It would be helpful to investigate this issue for that moment. As it might be the connect brake down from your end also. -- Kind regards, Max Pinner Just Host
Received on Mar/22/2011 8:25:28AM
Sorry, I don't understand your English. I provided you with a log of monitoring from many different cities that shows the web sites were unavailable for 1 hour 11 minutes. I do not pay just for the server to be up, I pay for it to be on the internet. Here are traceroutes from Madrid and Washington DC during the outage (03/22/2011 04:34:50AM) [traceroutes omitted for brevity (!)]
Sent to Dominic Sayers on Mar/22/2011 10:32:43AM
Dear Dominic Sayers, Please also show me your WAN IP address: https://mywanip.com/ It's necessary to check if your IP address wasn't blocked on the server. -- Kind regards, Vadim Dodds Just Host
Received on Mar/22/2011 10:37:27AM
In my last two messages I explained that the service was unavailable from multiple locations. I provided 2 kinds of evidence that this was the case. Why would the IP address from another location help? It's 217.144.158.42 here but that's irrelevant: the service is available to everyone now, including here.
Sent to Dominic Sayers on Mar/22/2011 12:04:44PM
Dear Dominic Sayers, 1) I have added your IP to white list of our server - please check your access to site now and inform me about results. 2) FSCK process was completed and now your site is working well. We apologize for the inconveniences caused to you. -- Kind regards, Vadim Dodds Just Host
Received on Mar/22/2011 12:29:35PM
This IP address was already on the whitelist. What was the cause of the 1 hour outage this morning?
Sent to Dominic Sayers on Mar/22/2011 9:22:47PM
Hi Dominic Thank you for your email. I am sorry to hear of the recent experience you have had with Just Host, I can assure you was an isolated event and we pride ourselves on exceptional uptime. Just Host is committed to providing a secure and reliable hosting environment. Customer websites are hosted on high performance quad processor servers, and our data center is equipped with a UPS power back-up generator. We perform 24/7 network monitoring, so if an issue does arise, we can address it immediately. Thank you in advance for your patience and understanding and apologies for any inconvenience caused. Kind regards, Aiden -- Aiden Pilcher
Received on Mar/23/2011 6:29:09AM
Hi Aidan, The ironic thing is that you sent this during another period of downtime. There have now been four incidents, starting on Monday after a long period of uptime (several months). This is no longer an isolated incident. I'm disappointed in the technical staff's ability to read the information I am giving them, often asking me to supply information that was in a previous email. Each shift seems to start the process from scratch as if it was a new incident. Several times they have assured me that the server is up and I must be mistaken. My point is that I am buying an internet-connected service and it is the connectivity that appears to be the issue. The monitoring attached is from Pingdom, which checks uptime from many cities around the world. All their monitoring locations found my sites unavailable at the times you can see. I do not understand why it takes up to 71 minutes to restore connectivity when it is lost. I do not understand why no root cause analysis has been done on what is clearly a systematic problem. I have had 99.97% uptime in the last 12 months. That is very good and I am happy with that level of service. Please don't let yourselves down now. Regards, Dominic Sayers
Sent to Dominic Sayers on Mar/23/2011 12:24:47PM
Hi Dominic,

Thank you for your reply.

The Just Host team are always happy to help out, please do not hesitate to contact us if you have any other questions or queries in the future. As always, feedback is greatly appreciated.

This ticket will now be closed.

Kind regards,
-- Izzy Izzy Sturgess
Received on Mar/23/2011 12:31:35PM
I just told you about a serious ongoing problem. On what basis are you closing this ticket? 1. Please keep the ticket open until the root cause analysis has been completed 2. Please let me know what you have done to prevent any further outages

<silence>