Tuesday, October 13, 2015

Maths Tests With R

This maths problem hit the news recently, about how far a crocodile should swim up the bank before going on land, in order to catch a zebra in the shortest possible time. It appears to be an A-level question, i.e. for 18-year old students.

The first two questions are arithmetic; more about understanding the question being asked. But the main question is obviously calculus: you are supposed to differentiate, and find out where it is zero.

I happened to have R open at the time, and my calculus is a bit rusty on how to differentiate a square root. So, this is what I typed:

T = function(x){
(5 * (36 + x^2) ^ 0.5) + 4 * (20-x)
}

(curly brackets were optional: it could all have been on one line.)

Then to answer the three questions:

T(20)
T(0)
optimize(T, lower=0, upper=20)

I.e. if he swims the whole way it is 10.44 seconds, if he cuts to land immediately it takes 11 seconds, and the 3rd line tells me he should swim 8 metres, then cut to land, and it will take 9.8 seconds.

Or, if you want to see how I should have solved it, and be reminded how to do a tricky differentiation, go to https://www.youtube.com/watch?v=xko48OoTAQU and watch from 5:00 to about 10:00. For comparison, It took me less than 1 minute to write the function and get the solutions. R itself ran instantly, of course.

As a data scientist, the important thing here is I use the same techniques when things get messy. If you show me enough observations of crocodiles catching zebras, I can give you an estimated function that also takes into account the speed of the flowing water, the wind speed, the age of the zebra, the water temperature, the air temperature, the weight of the crocodile, and when he last ate!

Tuesday, September 15, 2015

Markdown Editors (for Linux or cross-platform)

I’ve been using markdown more and more, and use pandoc to make a PDF from it. But it often comes out differently to how I expect, so I have been looking for an editor with live preview.

(This article has been updated, end-March 2016, to add Atom; and I've checked for any improvements in StackEdit, RStudio, NetBeans 8.1 and Remarkable. I've also added spell-checking as a requirement.)

Quick summary: all of them are (fatally) flawed.

Here is a quick review of each; well, more a summary of the flaws with each. (When I say “wrong”, in the below, I am treating pandoc, with default settings, as correct.)

stackedit.io: Online. That is a fatal flaw when looking for an offline editor! It also gets the 1-2-3 case wrong (see below). Not open source. I do like how it exports to Blogger, though. (I use it for writing blog posts.) Has same spell-checking as your browser (which is good). Additional Fatal Flaw: your data is stored in the browser, so almost impossible to backup separately; I've just discovered my recent clearing of all cookies has destroyed all my articles.

RStudio: It supports its own .rmd format (which allows embedding live R code inside markdown), but can also be a normal Markdown editor. But no live preview, so you need to keep clicking “Preview HTML” to see what it looks like (though it does have syntax hilighting as you type). It handles underlines wrongly (see below). No spell-checking.

Remarkable: Gets the 1-2-3 case wrong. Open source and looks nice. But it feels black-box-ish. E.g. I don’t know how the code syntax hilighting works, and I don’t know if writing {php} or {r} is being listened to. (It highlighted a short PHP code snippet, with or without a hint, but not R code.) Another fatal flaw: it resets the preview window to the top every time you add a new line, making it useless for a document longer than one screen. It is also very slow - a distinct sluggishness as you type.

Mark My Words. Gets both the 1-2-3 and underlines case correct in the preview window, but the underline case is wrong in the syntax highlighting in the editor window! The icons at the top are a bit confusing.

NetBeans, with the “Markdown Support” plugin. 8.0 had an unusable preview window, but as of NetBeans 8.1 that is better; however the editor and preview windows scroll completely independently, rather than staying in sync. No word-wrap in the editor window. On the plus side it gets the 1-2-3 and underline cases correct. (NB. NetBeans also has most of the problems I point out with Atom, below: in fact they are very similar.)

Haroopad: This makes me nervous, as it does not appear to be open source, and is a 40MB download. It gets the 1-2-3 case wrong. It also doesn’t do syntax highlighting (but that is not an essential feature for me). No spell-checking. No new releases the past 6 months, so this may be a dead/dying project.

Atom (built-in plugin): Currently (March 2016) this is the number one choice at a comparison of Linux Markdown Editors, so I just installed it. I think it is one to watch, because Atom is actively developed and with some more development the markdown support could become the best of the bunch. It handles 1-2-3 and underline cases correctly. There is live preview, but sadly the two panes are unconnected - when you scroll in one, the other just sits there, with no way to sync them. Also they do not agree what is correct markdown: the left window goes all weird with "*.txt", whereas the preview window handles it just fine. It underlines wrong spellings, but right-click does not suggest the correct spellings. It is more a programmer's editor than a writing tool, e.g. ctrl-b with a word highlighted does not make it bold. No print/export options.


My choice? Initially I went with Remarkable, as the best of the bunch, but I hadn't discovered the one-screenful limit at that point. (it beats Haroopad by being open source, and much smaller). Go with stackedit.io if working in a browser is okay for you. March 2016 Update: I've been using Haroopad the past 6 months, but the lack of spell-checking has become an irritation. Atom and NetBeans have very similar pros and cons; of the two I prefer Atom. Not sure if it is quite good enough yet to make me switch from Haroopad, though...


(Your suggestion? Let me know in the comments.)

The 1-2-3 problem. When I type:
1
2
3

(i.e. 1, 2 and 3 each on their own line, with no blank lines between them)

I should see “1 2 3”. A blank line is needed to start a new paragraph. It is nice if it shows the line break, but no good if I send that code to pandoc and all my neat formatting is lost! [BUT, maybe there is a nice flag I can give to pandoc to preserve that formatting, as I’d rather it worked that way!]
The underline problem. When I type:
Then you should open my_special_file.txt
It should not treat those underlines as italics or bold formatting. That formatting only applies with preceding whitespace:
This word is in _italics_ this one is in __bold__
That appears like this:
This word is in italics this one is in bold

Thursday, September 3, 2015

Format Japanese date with kanji day-of-week

In Japanese, there are single kanji for each day of the week.
var days = ['日','月','火','水','木','金','土'];
(If you want to mutter them under your breath, at work, to impress colleagues, nichi-getsu-ka-sui-moku-kin-do.)
In JavaScript, to put them in a date use days[d.getDay()] (where d is a Date object).
I use sugar.js, which adds a format() function (amongst loads of other useful stuff) to the Date class; I now extend it further with this:
Date.prototype.format_ja_MMDDK = function(){
var days = ['日','月','火','水','木','金','土'];
return this.format("{MM}月{dd}日") + "(" + days[this.getDay()] + ")";
};
(If you hate underlines feel free to use `formatJaMMDDK() or anything you like, for the matter.)
Here is one way you might use it (jQuery-syntax):
$('.todaysDate').html(new Date().format_ja_MMDDK());

Tuesday, June 2, 2015

Easylogging++: how to get one log file per day

I introduced EasyLogging++ before. This article will build on that to show how to rotate the logs daily.

In a nutshell, assuming your log filename already has date specifiers in it, all you have to do is run these two lines, at midnight each day:

auto L = el::Loggers::getLogger("default");
L->reconfigure();

If you have multiple loggers, repeat that for all of them.

I recommend using a config file to configure EasyLogging++; but if you are configuring it completely in your code, and your FILENAME entry does not include date specifiers, you can instead change the filename at any time with this:

Loggers::reconfigureAllLoggers(
 ConfigurationType::Filename,
 "/path/to/logs/my-new-filename.log"
 );

But, going back to the first approach, here is a complete program to show creating a new log file every 20 seconds (!). First create a logging.conf file with these contents:

-- default
* GLOBAL:
    FORMAT = "%datetime{%Y-%M-%d %H:%m:%s.%g},%level,%thread,%msg"
    Milliseconds_Width = 4
    TO_FILE = true
    FILENAME = "info.%datetime{%Y%M%d_%H%m%s}.log"
    LOG_FLUSH_THRESHOLD = 5

(The FORMAT, and Milliseconds_Width lines are optional, but useful for checking it worked.)

Here is the full code:

#define _ELPP_THREAD_SAFE
#define _ELPP_NO_DEFAULT_LOG_FILE
#include "easylogging++.h"
_INITIALIZE_EASYLOGGINGPP

namespace sc = std::chrono;

int main(int,char**){
el::Loggers::configureFromGlobal("logging.conf");
LOG(INFO)<<"The program has started!";

std::thread logRotatorThread([](){
const sc::seconds wakeUpDelta = sc::seconds(20);
auto nextWakeUp = sc::system_clock::now() + wakeUpDelta;

while(true){
    std::this_thread::sleep_until(nextWakeUp);
    nextWakeUp += wakeUpDelta;
    LOG(INFO) << "About to rotate log file!";
    auto L = el::Loggers::getLogger("default");
    if(L == nullptr)LOG(ERROR)<<"Oops, it is not called default!";
    else L->reconfigure();
    }

});

logRotatorThread.detach();

//Main thread
for(int n=0; n < 1000; ++n){
    LOG(TRACE) << n;
    std::this_thread::sleep_for(sc::milliseconds(100));
    }

LOG(INFO) << "Shutting down.";
return 0;
}

I compiled it with this command:

g++ -std=c++11 -Wall -Werror logtest.cpp -lpthread -o logtest

and then ran it with this command:

./logtest

It should be easy to follow. I set up a dedicated thread to call reconfigure() every 20 seconds, and then the main thread logs a counter about 10 times/second.

You’ll end up with about 5 log files, and you can examine them to see that no log commands were lost.

If I was coding for a mission-critical application, where missing even a single log line would be considered Very Bad, I might set up a mutex and a lock to make sure the main thread is not active when the call to reconfigure() happens. I don’t know for sure if that is needed, or if it is guaranteed to be safe. If you know for sure one way or the other, please leave a comment!

But, for a once/day log rotation, in most applications this is a small enough risk that I would not want the overhead of the extra mutex, and I would go with the code shown above.

Thursday, March 19, 2015

Add a forwarding alias with google mail

The goal was to create a special email alias to forward to a 3rd party. E.g. accountant@example.com would be forwarded to joe.bloggs@my.accountant.com This is easy when you host your own mail server (edit /etc/aliases) or even when your domain uses cPanel (find the forwarding icon under mail config). But my company email is hosted at google…

Well, here are easy 22-step instructions for adding a forwarding address to a company email account hosted by google. You will need a handkerchief, strong resolve and the co-operation of each of the people receiving the forwarded email.

  1. sign in to gmail
  2. Under the settings cog icon, choose “manage this domain” (do not choose “settings”!)
  3. Click users from the left menu
  4. Click the user
  5. The “Account” section is a button, click it to open it up.
  6. Scroll down to Aliases, and click “Add an alias”.
  7. Type in the alias name. Don’t be put off by the big “CANCEL” button, and the lack of a submit button. Just go down to the bottom and click SAVE CHANGES.
  8. Now, deep breath, go back to gmail, and this time you do want to choose “settings” from the settings cog.
  9. Choose filters from the blue menu along the top.
  10. Choose “Create a new filter”
  11. Put the new alias email address in the “To:” block. (Careful: this is not the “create new filter” page, but is a search page!!)
  12. Click the “create filter with this search” link in the bottom right.
  13. Click “add a forwarding address”.
  14. Do it again.
  15. Input the address.
  16. Check your email. Click the link in the email you get. If it is a 3rd party, wait for them to click it.
  17. Sob into your handkerchief. Then console yourself, as you are nearly there.
  18. Repeat 13 to 16 if more than one address.
  19. Now go back to do steps 10, 11 and 12 again. This time don’t click “add forwarding address”, but choose your target from the dropdown box.
  20. Click save, and I think you are done. Do a test, and see if it arrives! If it does not, give it more time. For me the email arrived in my main email immediately (when it shouldn’t have at all), but then turned up in the forwardee inbox 19 seconds later.
  21. I think if you want to forward it to more than one person you need to set up a whole new forwarding filter for each forwardee.
  22. Resolve never to moan about how clunky cPanel is, ever again.

(To be fair, I suspect steps 2 to 7 are not required, which may be why I am receiving a copy. And, of course, 10, 11 and 12 were a mistake the first time round. But the above took so long that I don’t have the time to do any more experimentation today.)

Sunday, February 22, 2015

Gnucash Timezone Problem Workaround

GnuCash is accounting software. It is usable, which is the most important compliment software can get, but it has one annoying bug, first reported in 2002. Let’s assume it will never be fixed, and just work around it.

The bug is that dates are stored in the xml file as timestamps. E.g. if you say a transaction happened on 2014-06-21, then it gets stored as “2014-06-21 00:00:00”. That is the bug. The problem is that it also stores the users current timezone. So, if I type that in when in the Asia/Tokyo timezone it will actually store it as “2014-06-21 00:00:00 +0900”. If I then open the gnucash file on a server in the Europe/London timezone, and re-save, the file now stores this: “2014-06-20 16:00:00 +0100”. It has moved it to BST timezone, but in the UI all you now see is that the transaction is on the wrong day! (That was a real problem, but if you think that is exotic the original bug report was done by someone who merely moved from one state in the U.S. to another state!)

Anyway, my proposed workaround is to always run gnucash in the UTC timezone. On Linux you do this from commandline by starting it like this:

TZ=UTC gnucash

In windows environments, you might have associated your *.gnucash files, so double-clicking them opens them in gnucash. The following instructions work for xfce (thunar file manager), but I suspect gnome is exactly the same.

  1. Right click any gnucash file, and choose properties
  2. Choose open with, and at the bottom it says “Other application…”
  3. Give bash -c "TZ=UTC gnucash %f"
  4. Close it, and test it by double-clicking any gnucash file.

(I confirmed it had worked by looking at the raw XML.)

What to do if you have someone else working on your files who does not want to do this? Or is on Windows (where I don’t know a similar workaround)? Well, the only solution I know is that they temporarily move their whole machine to UTC, open, edit and save your gnucash files, then restore their machine back to their real timezone. (Remember that UTC is different from Europe/London.)

Thursday, February 5, 2015

PHP Sending mails twice?? (doing stuff twice)

Just had a mammoth troubleshooting session, because a very simple PHP script to send an email kept sending them twice. I’d only just configured postfix to send email, so I kept looking for problems there. I kept staring at the PHP code, and could see no problem, but it looked more and more like PHP was calling postfix twice. Commenting php.ini settings in and out made no difference. This was commandline PHP, so nothing to do with browser reloads, or anything like that.

Then I had the brainwave to append a random number to the bottom of the body text. Different numbers; in fact, not just that, but the 2nd email got both numbers! So it is definitely my PHP script. But I still couldn’t see it.

Stripped down, so the problem is more obvious, it looked like this:

$bodyText = "Whatever";

class Test{

function test(){
$bodyText.="RANDOM=".mt_rand(1,1000);
mail("me@example.com", "Test", $bodyText);
}
}

$R = new Test;
$R->test();

I’ve been spending too much time jumping between languages. And I’d also got used to PHP constructors being called __construct() and forget it still offered backwards compatibility for using the class name. Yep, that’s right, PHP functions (and classes) are case-insensitive, and so test() was being treated as the constructor of class Test. So one mail was being sent from the constructor, the second from my explicit function call. Grrrr….

(The above is also a possible explanation for problems like “PHP calls web service twice”, or “PHP has double log entries” or “PHP does something twice”!)

Written with StackEdit.