Skip to content Skip to sidebar Skip to footer

Numpy Sort Wierd Behavior

I'm looking at the answers to an earlier question I asked. numpy.unique with order preserved They work great, but with one example, I have problems. b ['Aug-09' 'Aug-09' 'Aug-09' .

Solution 1:

Update:

I believe the problem is actually this. What version of Numpy are you running?

http://projects.scipy.org/numpy/ticket/2063

I reproduced your problem because the Ubuntu installation of Numpy I tested on was 1.6.1, and the bug was fixed at 1.6.2 and above.

Upgrade Numpy, and try again, it worked for me on my Ubuntu machine.


In these lines:

bi, idxb = np.unique(b, return_index=True)
months = bi[np.argsort(idxb)]

There are two mistakes:

  1. You want to actually use the sorted indices on the original array, b[...]
  2. You want the sorted indices, not the indices that sort the indices, so use sort not argsort.

This should work:

bi, idxb = np.unique(b, return_index=True)
months = b[np.sort(idxb)]

Yes, it does, using your data set and running python 2.7, numpy 1.7 on Mac OS 10.6, 64 bit

Python 2.7.3 (default, Oct 232012, 13:06:50) 

IPython 0.13.1-- An enhanced Interactive Python.In [1]: import numpy as np

In [2]: np.__version__
Out[2]: '1.7.0'In [3]: from platform import architecture

In [4]: architecture()
Out[4]: ('64bit', '')

In [5]: f =open('test.txt','r')

In [6]: lines = np.array([line.strip() for line in f.readlines()])

In [7]: _, ilines = np.unique(lines, return_index =True)

In [8]: months = lines[np.sort(ilines)]

In [9]: months
Out[9]: 
array(['Aug-09', 'Sep-09', 'Oct-09', 'Nov-09', 'Dec-09', 'Jan-10',
       'Feb-10', 'Mar-10', 'Apr-10', 'May-10', 'Jun-10', 'Jul-10',
       'Aug-10', 'Sep-10', 'Oct-10', 'Nov-10', 'Dec-10', 'Jan-11',
       'Feb-11', 'Mar-11', 'Apr-11', 'May-11', 'Jun-11', 'Jul-11',
       'Aug-11', 'Sep-11', 'Oct-11', 'Nov-11', 'Dec-11', 'Jan-12',
       'Feb-12', 'Mar-12', 'Apr-12', 'May-12', 'Jun-12', 'Jul-12',
       'Aug-12', 'Sep-12', 'Oct-12', 'Nov-12', 'Dec-12', 'Jan-13'], 
      dtype='|S6')

OK, I can finally reproduce your problem on Ubuntu 64 bit too:

Python 2.7.3 (default, Aug  12012, 05:14:39) 

IPython 0.12.1-- An enhanced Interactive Python.In [1]: import numpy as np

In [2]: np.__version__
Out[2]: '1.6.1'In [3]: from platform import architecture

In [4]: architecture()
Out[4]: ('64bit', 'ELF')

In [5]: f =open('test.txt','r')

In [6]: lines = np.array([line.strip() for line in f.readlines()])

In [7]: _, ilines = np.unique(lines, return_index=True)

In [8]: months = lines[np.sort(ilines)]

In [9]: months
Out[9]: 
array(['Feb-10', 'Aug-10', 'Nov-10', 'Oct-12', 'Oct-11', 'Jul-10',
       'Feb-12', 'Sep-11', 'Jan-10', 'Apr-10', 'May-10', 'Sep-09',
       'Mar-11', 'Jun-12', 'Jul-12', 'Dec-09', 'Aug-09', 'Nov-12',
       'Dec-12', 'Apr-12', 'Jun-11', 'Jan-11', 'Jul-11', 'Sep-10',
       'Jan-12', 'Dec-10', 'Oct-09', 'Nov-11', 'Oct-10', 'Mar-12',
       'Jan-13', 'Nov-09', 'May-11', 'Mar-10', 'Jun-10', 'Dec-11',
       'May-12', 'Feb-11', 'Aug-11', 'Sep-12', 'Apr-11', 'Aug-12'], 
      dtype='|S6')

Works on Ubuntu after Numpy upgrade:

Python 2.7.3 (default, Aug  12012, 05:14:39) 

IPython 0.12.1 -- An enhanced Interactive Python.

In [1]: import numpy as np

In [2]: np.__version__
Out[2]: '1.7.0'

In [3]: f = open('test.txt','r')

In [4]: lines = np.array([line.strip() for line in f.readlines()])

In [5]: _, ilines = np.unique(lines, return_index=True)

In [6]: months = lines[np.sort(ilines)]

In [7]: months
Out[7]: 
array(['Aug-09', 'Sep-09', 'Oct-09', 'Nov-09', 'Dec-09', 'Jan-10',
       'Feb-10', 'Mar-10', 'Apr-10', 'May-10', 'Jun-10', 'Jul-10',
       'Aug-10', 'Sep-10', 'Oct-10', 'Nov-10', 'Dec-10', 'Jan-11',
       'Feb-11', 'Mar-11', 'Apr-11', 'May-11', 'Jun-11', 'Jul-11',
       'Aug-11', 'Sep-11', 'Oct-11', 'Nov-11', 'Dec-11', 'Jan-12',
       'Feb-12', 'Mar-12', 'Apr-12', 'May-12', 'Jun-12', 'Jul-12',
       'Aug-12', 'Sep-12', 'Oct-12', 'Nov-12', 'Dec-12', 'Jan-13'], 
      dtype='|S6')

Post a Comment for "Numpy Sort Wierd Behavior"