Thursday, March 5, 2009

Comparing two arrays in PHP when elements can be in any order

This comes up so often. Usually I want to compare two CSV strings, where order does not matter.

My usual strategy is to first explode(',',$s) both strings. Then loop through one array, seeing if that element exists in the second array. If it does not then we instantly fail. If it does match then remove that element from the second array. At the end, if no failures and if the second array is empty, then they are the same.

When writing the above code for the umpteenth I wondered for the umpteenth time if there was a better way. I checked the PHP manual and still no "compare_arrays_can_be_in_any_order()" function. (Or, am I missing it - please let me know!)

But maybe I can use a couple of functions?

First idea: array_intersect(). Returns all elements of array1 that are in array2. So:

$intersect1=array_intersect($list1,$list2);
if($intersect1!==$list1)return false; //list2 is a subset of list1.
$intersect2=array_intersect($list2,$list1);
if($intersect2!==$list2)return false; //list1 is a subset of list2.
return true; //They are supersets of each other. I.e. equal.

I've not tried this, as my second idea was better: sort(). The implementation is:

function compare_arrays_can_be_in_any_order($list1,$list2){
sort($list1);
sort($list2);
return ($list1===$list2);
}

Note: this is destructive, as both list1 and list2 are modified. Written as above this is fine as I pass the arrays in by value. But be careful if pasting just that code into another function.

Here is the function to solve my original csv matching problem:

function compare_csv_strings_can_be_in_any_order($csv1,$csv2){
$list1=explode(',',$csv1);
$list2=explode(',',$csv2);
sort($list1);
sort($list2);
return ($list1===$list2);
}

Bear in mind this will only work for simple csv strings: if one uses quotes and the other doesn't, or one uses whitespace around the comma and the other doesn't, you need something more sophisticated.

Now your homework question to make sure you are paying attention: sort() takes an optional parameter of flags, to specify sorting items numerically, as strings, or as strings using current locale. Do I need to specify any flags to have my code work with any type of array data, and to work on a machine using a different locale?

What about comparing arrays of arrays?

And final question, the one of most interest to me: do you have a better way? (Better could mean any of: less code, quicker execution or fewer restrictions/disclaimers!)