Showing posts with label networking. Show all posts
Showing posts with label networking. Show all posts

Monday, September 10, 2012

They are SPOFs hiding everywhere!

One of my test systems sent me a few hundred emails between 2:38 and 7:45am JST. Just a test server so it didn't go to my phone and I noticed the problem around 7:30am. No paying clients were affected.

I dived straight in and found my Rackspace UK box couldn't find api.qqtrend.com. But DNS lookup worked for me. Other DNS was  working on the Rackspace box. I also found it couldn't ping the DNS server (at GoDaddy). But I could from my office LAN. And I also could from a U.S. server.

Conclusion (totally wrong - see below): Rackspace UK data centre had issues.

So I logged in to Rackspace, to check for alerts, and post a support ticket. There was a mention, phrased very vaguely, saying "someone else" has DNS problems. Hhmmm. By this time I could ping the GoDaddy server, but DNS lookup still failed. Uncertain I pinged around a bit more, and by that time DNS had started working.

I.e. The underlying problem had already been fixed, it was just taking time to spread through the internet, and I could have "fixed it" with no effort by just staying in bed 15 minutes longer. Oh well.

Anyway, it turns out it was a sociopath attacking GoDaddy: http://www.bbc.co.uk/news/business-19549367


Here are his reasons:
    "i'm taking godaddy down bacause well i'd like to test how the cyber security is safe and for more reasons that i can not talk now." (sic)

OK, English may not be his first language. But even allowing for that, he is not coming across as an upstanding member of society. Not protesting, no cause, just wanting to see if he had the skills to annoy a lot of people. This is a guy (gal?) who badly needs a girlfriend/boyfriend. If you know him, introduce him to someone. Please.

I know GoDaddy had a big PR screw-up by initially supporting SOPA, but they had the courage and sense to listen to people and change their position. Still a good company in my mind.

But, the silver lining is it nicely illustrated we (QQ Trend) have a Single Point Of Failure, at the DNS and registrar level, that had been overlooked. We'd previously got server1 and server2 as the two endpoints. We have them in different continents, and different cloud providers (no secret: Amazon and Rackspace). And I thought that was solidity to boast about. I was about to add server3 in a third continent as an option for customers who really, really need 100% uptime.

But what today's problem reveals is that if all three servers are on the same domain: server1.example.com, server2.example.com, server3.example.com, then we have a potential issue with DNS, and even with the registrar.

I think we need an alternative domain, at a completely different registrar, with DNS at an independent ISP. Then at the script level we add that in as one of the failover endpoints. They'll point to same three servers.

For instance, however big Amazon or GoDaddy (or any infrastructure provider) get, however many data centres they have around the world, they are still open to attacks, politics and human error inside their organization. We're service providers building on top of their infrastructure. It is our job to accept their limitations and do something about it.



Thursday, February 16, 2012

shared_from_this causing Exception tr1::bad_weak_ptr

I've been having a rotten week, with my boost::asio program keep giving me a segmentation fault... and it is not even doing the real work yet. It crashes when a client disconnects. The error message is:

   Exception: tr1::bad_weak_ptr

My code has now been littered with debug lines, lots of them showing usage counts of the shared_ptr in the hope of tracking down at what stage it is going wrong:

   std::cout << "this.use_count=" << shared_from_this().use_count() << "\n";

If that is unfamiliar, my class is defined like this:
   class client_session :
     public boost::enable_shared_from_this< client_session >{ ... }

This allows me to pass around shared pointers to this, from inside the class being pointed at, and is one of the essential tools you need to do anything useful with boost::asio.

I've been reading tutorials, studying other people's code, and progressively adding more shared pointers around objects that I am sure do not really need it. Nothing would shake it.

My code is using cross-references: the client connection object stores a vector of references to the data sources it uses, and each data source stores a vector of references to the clients subscribed to it. When I say reference I mean it holds a smart pointer instance. It is not that complicated but surely the problem must be in that cross-referencing? So, in desperation I deleted the entire data source class, and all that subscribing and unsubscribing code. Eh? It still crashes.

But then I noticed this code:
  ~client_session(){
    std::cout << "In client_session destructor (this.use_count="
        << shared_from_this().use_count() << ")\n";
    unsubscribe_from_all(); //'cos we'll no longer be valid after this
    }

I knew (!!) the problem was not in the destructor, but had to (!!) be before that point, because that first debug line was never reached. If you're already laughing at me, have a healthy helping of kudos. Yes, it was that call to shared_from_this() causing the crash! I was reaching the destructor, but crashing before it could print my debug line.

You see, in C++, an object does not really exist until the end of the constructor, and does not really exist when you enter the destructor. You must not use shared_from_this() in the destructor, or in any function called from the destructor. And when I thought again about what unsubscribe_from_all() (which was also calling shared_from_this()) does, I realized the destructor could not ever be called if any data sources still have a reference to us. So that call is not needed. The destructor code became:

  ~client_session(){
    std::cout << "In client_session constructor.\n";
    assert(subscriptions.size()==0);
    }

...and the crashes went away.

There is something very, very annoying knowing the bug I've been chasing for *two solid days* was in the debug code I added to track down the bug.

Saturday, September 17, 2011

Add a temporary static IP address

At home, with wired ethernet, my (Ubuntu) notebook has a few static IP addresses that I use for developing websites. Out of the house, I use wicd, so I have a dynamic IP address, and those static IPs don't exist. wicd configuration is too complex for me to understand, so I just accept this, but it caught me short the other day when I needed to have both an internet connection and to be able to work on a website running on my notebook.

I failed then, but I'm ready for next time. To temporarily add a static IP address you simply do (as root):
ifconfig eth0:3 10.10.10.10 netmask 255.255.0.0
I'm choosing "eth0:3" for the interface; it can be any unused number after the colon, and you never need to care what this is. netmask can really be anything for our purposes. The 10.10.10.10 is the IP address I've given it. Test with this:
ping 10.10.10.10
To set up a quick virtual host create a file under /etc/apache2/conf.d called 10.10.10.10.conf (any filename is fine) with these contents:
<virtualhost 10.10.10.10:80="">
    DocumentRoot "/var/www/somewhere"
    ServerName 10.10.10.10
</virtualhost> 
 
Tidyup
To remove just the interface that you added above, use this command:
ip addr del 10.10.10.10/32 dev eth0:3
Or, to restore the network to boot defaults (useful if you have done lots of changes) you can do:
ifdown -a
ifup -a
Either way to then remove the apache config: delete the 10.10.10.10.conf file you created and restart apache.

Tuesday, January 18, 2011

Buffalo Air Station setup on existing home LAN

I found this surprisingly difficult; though it turns out the steps involved are quite easy...once you know how.

For this article I'll assume your existing LAN is 10.0.0.0/8 with 10.0.0.1 as the default gateway. I'm going to give the wireless LAN router 10.1.2.3. Substitute 10.0.0.1 and 10.1.2.3 below for something suitable for your LAN (and change the 255.0.0.0 network mask accordingly).

These instructions assume factory reset status, with the "router" switch (on the case) set to "On", rather than "Off" or "Auto". The AirStation is connected to the LAN hub using the Internet socket (the blue one).

1. Connect a computer to the Air Station directly (with ethernet cable, to one of the four LAN sockets), and set your computer to use DHCP, so that it gets an IP address that can see the Air Station! (Most likely your computer will receive 192.168.11.2.)
Note: The computer you connect with does not need to have a wireless card in it; it does have to have wired ethernet though.
Tip: Don't use your main computer for this step; that way you will be able to still use your main computer for: a) googling when you hit problems; b) ping tests (see step 5 below).

2. Connect by browser to 192.168.11.1
Use root as the username, with blank password.

3. Internet/LAN | Internet
IP Manual: 10.1.2.3
255.0.0.0
Extended:
Default GW: 10.0.0.1
DNS: (whatever DNS server you use: this is what will be handed out to wireless clients that connect using DHCP)

4. Wait for it to restart. (I also enabled pings at this point too.)

5. Then cycle the power; this appears to be essential.
Now you should be able to ping 10.1.2.3 from outside, and also ping from the connected computer to 10.0.0.1. And if you can do that then you can also connect to internet! You are done, time for a nice cuppa.

6. Test from a wireless device to make sure you are required to type in the key.

7. Set a root password.