Sorting numpy array on multiple columns in Python

up vote
6
down vote

favorite

I am trying to sort the following array on column1, then column2 and then column3

[['2008' '1' '23' 'AAPL' 'Buy' '100']

 ['2008' '1' '30' 'AAPL' 'Sell' '100']

 ['2008' '1' '23' 'GOOG' 'Buy' '100']

 ['2008' '1' '30' 'GOOG' 'Sell' '100']

 ['2008' '9' '8' 'GOOG' 'Buy' '100']

 ['2008' '9' '15' 'GOOG' 'Sell' '100']

 ['2008' '5' '1' 'XOM' 'Buy' '100']

 ['2008' '5' '8' 'XOM' 'Sell' '100']]

I used the following code:

    idx=np.lexsort((order_array[:,2],order_array[:,1],order_array[:,0]))

    order_array=order_array[idx]

The resultant array is

[['2008' '1' '23' 'AAPL' 'Buy' '100']

 ['2008' '1' '23' 'GOOG' 'Buy' '100']

 ['2008' '1' '30' 'AAPL' 'Sell' '100']

 ['2008' '1' '30' 'GOOG' 'Sell' '100']

 ['2008' '5' '1' 'XOM' 'Buy' '100']

 ['2008' '5' '8' 'XOM' 'Sell' '100']

 ['2008' '9' '15' 'GOOG' 'Sell' '100']

 ['2008' '9' '8' 'GOOG' 'Buy' '100']]

The problem is that the last two rows are wrong. The correct array should have the last row as the second last one. I have tried everything but am not able to understand why this is happening. Will appreciate some help.

I am using the following code for obtaining order_array.

 for i in ….

    x= ldt_timestamps[i] # this is a list of timestamps

    s_sym=……

    list=[int(x.year),int(x.month),int(x.day),s_sym,'Buy',100]   

    rows_list.append(list) 



 order_array=np.array(rows_list)

edited Oct 3 '13 at 10:41

asked Oct 3 '13 at 10:10

user2842122

3113

possible duplicate of Sorting a 2D numpy array by multiple axes Use that answer, but use a dtype that makes sense for your data (not all strings), e.g. dt = dt=[('y',np.uint32),('m',np.uint32),('d',np.uint32),('sym','S4'),('bs','S4'),('huh',np.uint32)]
– askewchan
Oct 3 '13 at 15:16

add a comment |

up vote
6
down vote

favorite

I am trying to sort the following array on column1, then column2 and then column3

[['2008' '1' '23' 'AAPL' 'Buy' '100']

 ['2008' '1' '30' 'AAPL' 'Sell' '100']

 ['2008' '1' '23' 'GOOG' 'Buy' '100']

 ['2008' '1' '30' 'GOOG' 'Sell' '100']

 ['2008' '9' '8' 'GOOG' 'Buy' '100']

 ['2008' '9' '15' 'GOOG' 'Sell' '100']

 ['2008' '5' '1' 'XOM' 'Buy' '100']

 ['2008' '5' '8' 'XOM' 'Sell' '100']]

I used the following code:

    idx=np.lexsort((order_array[:,2],order_array[:,1],order_array[:,0]))

    order_array=order_array[idx]

The resultant array is

[['2008' '1' '23' 'AAPL' 'Buy' '100']

 ['2008' '1' '23' 'GOOG' 'Buy' '100']

 ['2008' '1' '30' 'AAPL' 'Sell' '100']

 ['2008' '1' '30' 'GOOG' 'Sell' '100']

 ['2008' '5' '1' 'XOM' 'Buy' '100']

 ['2008' '5' '8' 'XOM' 'Sell' '100']

 ['2008' '9' '15' 'GOOG' 'Sell' '100']

 ['2008' '9' '8' 'GOOG' 'Buy' '100']]

I am using the following code for obtaining order_array.

 for i in ….

    x= ldt_timestamps[i] # this is a list of timestamps

    s_sym=……

    list=[int(x.year),int(x.month),int(x.day),s_sym,'Buy',100]   

    rows_list.append(list) 



 order_array=np.array(rows_list)

edited Oct 3 '13 at 10:41

asked Oct 3 '13 at 10:10

user2842122

3113

possible duplicate of Sorting a 2D numpy array by multiple axes Use that answer, but use a dtype that makes sense for your data (not all strings), e.g. dt = dt=[('y',np.uint32),('m',np.uint32),('d',np.uint32),('sym','S4'),('bs','S4'),('huh',np.uint32)]
– askewchan
Oct 3 '13 at 15:16

add a comment |

up vote
6
down vote

favorite

I am trying to sort the following array on column1, then column2 and then column3

[['2008' '1' '23' 'AAPL' 'Buy' '100']

 ['2008' '1' '30' 'AAPL' 'Sell' '100']

 ['2008' '1' '23' 'GOOG' 'Buy' '100']

 ['2008' '1' '30' 'GOOG' 'Sell' '100']

 ['2008' '9' '8' 'GOOG' 'Buy' '100']

 ['2008' '9' '15' 'GOOG' 'Sell' '100']

 ['2008' '5' '1' 'XOM' 'Buy' '100']

 ['2008' '5' '8' 'XOM' 'Sell' '100']]

I used the following code:

    idx=np.lexsort((order_array[:,2],order_array[:,1],order_array[:,0]))

    order_array=order_array[idx]

The resultant array is

[['2008' '1' '23' 'AAPL' 'Buy' '100']

 ['2008' '1' '23' 'GOOG' 'Buy' '100']

 ['2008' '1' '30' 'AAPL' 'Sell' '100']

 ['2008' '1' '30' 'GOOG' 'Sell' '100']

 ['2008' '5' '1' 'XOM' 'Buy' '100']

 ['2008' '5' '8' 'XOM' 'Sell' '100']

 ['2008' '9' '15' 'GOOG' 'Sell' '100']

 ['2008' '9' '8' 'GOOG' 'Buy' '100']]

I am using the following code for obtaining order_array.

 for i in ….

    x= ldt_timestamps[i] # this is a list of timestamps

    s_sym=……

    list=[int(x.year),int(x.month),int(x.day),s_sym,'Buy',100]   

    rows_list.append(list) 



 order_array=np.array(rows_list)

edited Oct 3 '13 at 10:41

asked Oct 3 '13 at 10:10

user2842122

3113

I am trying to sort the following array on column1, then column2 and then column3

[['2008' '1' '23' 'AAPL' 'Buy' '100']

 ['2008' '1' '30' 'AAPL' 'Sell' '100']

 ['2008' '1' '23' 'GOOG' 'Buy' '100']

 ['2008' '1' '30' 'GOOG' 'Sell' '100']

 ['2008' '9' '8' 'GOOG' 'Buy' '100']

 ['2008' '9' '15' 'GOOG' 'Sell' '100']

 ['2008' '5' '1' 'XOM' 'Buy' '100']

 ['2008' '5' '8' 'XOM' 'Sell' '100']]

I used the following code:

    idx=np.lexsort((order_array[:,2],order_array[:,1],order_array[:,0]))

    order_array=order_array[idx]

The resultant array is

[['2008' '1' '23' 'AAPL' 'Buy' '100']

 ['2008' '1' '23' 'GOOG' 'Buy' '100']

 ['2008' '1' '30' 'AAPL' 'Sell' '100']

 ['2008' '1' '30' 'GOOG' 'Sell' '100']

 ['2008' '5' '1' 'XOM' 'Buy' '100']

 ['2008' '5' '8' 'XOM' 'Sell' '100']

 ['2008' '9' '15' 'GOOG' 'Sell' '100']

 ['2008' '9' '8' 'GOOG' 'Buy' '100']]

I am using the following code for obtaining order_array.

 for i in ….

    x= ldt_timestamps[i] # this is a list of timestamps

    s_sym=……

    list=[int(x.year),int(x.month),int(x.day),s_sym,'Buy',100]   

    rows_list.append(list) 



 order_array=np.array(rows_list)

python sorting numpy

edited Oct 3 '13 at 10:41

asked Oct 3 '13 at 10:10

user2842122

3113

edited Oct 3 '13 at 10:41

asked Oct 3 '13 at 10:10

user2842122

3113

edited Oct 3 '13 at 10:41

asked Oct 3 '13 at 10:10

user2842122

3113

asked Oct 3 '13 at 10:10

user2842122

3113

asked Oct 3 '13 at 10:10

user2842122

3113

possible duplicate of Sorting a 2D numpy array by multiple axes Use that answer, but use a dtype that makes sense for your data (not all strings), e.g. dt = dt=[('y',np.uint32),('m',np.uint32),('d',np.uint32),('sym','S4'),('bs','S4'),('huh',np.uint32)]
– askewchan
Oct 3 '13 at 15:16

add a comment |

possible duplicate of Sorting a 2D numpy array by multiple axes Use that answer, but use a dtype that makes sense for your data (not all strings), e.g. dt = dt=[('y',np.uint32),('m',np.uint32),('d',np.uint32),('sym','S4'),('bs','S4'),('huh',np.uint32)]
– askewchan
Oct 3 '13 at 15:16

possible duplicate of Sorting a 2D numpy array by multiple axes Use that answer, but use a dtype that makes sense for your data (not all strings), e.g. dt = dt=[('y',np.uint32),('m',np.uint32),('d',np.uint32),('sym','S4'),('bs','S4'),('huh',np.uint32)]
– askewchan
Oct 3 '13 at 15:16

add a comment |

1 Answer
1

active

oldest

votes

up vote
9
down vote

tldr: NumPy shines when doing numerical calculations on numerical arrays. Although it is possible (see below) NumPy is not well suited for this. You're probably better off using Pandas.

The cause of the problem:

The values are being sorted as strings. You need to sort them as ints.

In [7]: sorted(['15', '8'])

Out[7]: ['15', '8']



In [8]: sorted([15, 8])

Out[8]: [8, 15]

This happened because order_array contains strings. You need to convert those strings to ints where appropriate.

Converting dtypes from string-dtype to numerical dtype requires allocating space for a new array. Therefore, you would probably be better off revising the way you are creating order_array from the beginning.

Interestingly, even though you converted the values to ints, when you call

order_array = np.array(rows_list)

NumPy by default creates a homogenous array. In a homogeneous array every value has a same dtype. So NumPy tried to find the common denominator among all your
values and chose a string dtype, thwarting the effort you put into converting the strings to ints!

You can check the dtype for yourself by inspecting order_array.dtype:

In [42]: order_array = np.array(rows_list)



In [43]: order_array.dtype

Out[43]: dtype('|S4')

Now, how do we fix this?

Using an object dtype:

The simplest way is to use an 'object' dtype

In [53]: order_array = np.array(rows_list, dtype='object')



In [54]: order_array

Out[54]: 

array([[2008, 1, 23, AAPL, Buy, 100],

       [2008, 1, 30, AAPL, Sell, 100],

       [2008, 1, 23, GOOG, Buy, 100],

       [2008, 1, 30, GOOG, Sell, 100],

       [2008, 9, 8, GOOG, Buy, 100],

       [2008, 9, 15, GOOG, Sell, 100],

       [2008, 5, 1, XOM, Buy, 100],

       [2008, 5, 8, XOM, Sell, 100]], dtype=object)

The problem here is that np.lexsort or np.sort do not work on arrays of
dtype object. To get around that problem, you could sort the rows_list
before creating order_list:

In [59]: import operator



In [60]: rows_list.sort(key=operator.itemgetter(0,1,2))

Out[60]: 

[(2008, 1, 23, 'AAPL', 'Buy', 100),

 (2008, 1, 23, 'GOOG', 'Buy', 100),

 (2008, 1, 30, 'AAPL', 'Sell', 100),

 (2008, 1, 30, 'GOOG', 'Sell', 100),

 (2008, 5, 1, 'XOM', 'Buy', 100),

 (2008, 5, 8, 'XOM', 'Sell', 100),

 (2008, 9, 8, 'GOOG', 'Buy', 100),

 (2008, 9, 15, 'GOOG', 'Sell', 100)]



order_array = np.array(rows_list, dtype='object')

A better option would be to combine the first three columns into datetime.date objects:

import operator

import datetime as DT



for i in ...:

    seq = [DT.date(int(x.year), int(x.month), int(x.day)) ,s_sym, 'Buy', 100]   

    rows_list.append(seq)

rows_list.sort(key=operator.itemgetter(0,1,2))        

order_array = np.array(rows_list, dtype='object')



In [72]: order_array

Out[72]: 

array([[2008-01-23, AAPL, Buy, 100],

       [2008-01-30, AAPL, Sell, 100],

       [2008-01-23, GOOG, Buy, 100],

       [2008-01-30, GOOG, Sell, 100],

       [2008-09-08, GOOG, Buy, 100],

       [2008-09-15, GOOG, Sell, 100],

       [2008-05-01, XOM, Buy, 100],

       [2008-05-08, XOM, Sell, 100]], dtype=object)

Even though this is simple, I don't like NumPy arrays of dtype object.
You get neither the speed nor the memory space-saving gains of NumPy arrays with
native dtypes. At this point you might find working with a Python list of lists
faster and syntactically easier to deal with.

Using a structured array:

A more NumPy-ish solution which still offers speed and memory benefits is
to use a structured array (as opposed to homogeneous array). To make a
structured array with np.array you'll need to supply a dtype explicitly:

dt = [('year', '<i4'), ('month', '<i4'), ('day', '<i4'), ('symbol', '|S8'),

      ('action', '|S4'), ('value', '<i4')]

order_array = np.array(rows_list, dtype=dt)



In [47]: order_array.dtype

Out[47]: dtype([('year', '<i4'), ('month', '<i4'), ('day', '<i4'), ('symbol', '|S8'), ('action', '|S4'), ('value', '<i4')])

To sort the structured array you could use the sort method:

order_array.sort(order=['year', 'month', 'day'])

To work with structured arrays, you'll need to know about some differences between homogenous and structured arrays:

Your original homogenous array was 2-dimensional. In contrast, all
structured arrays are 1-dimensional:

In [51]: order_array.shape

Out[51]: (8,)

If you index the structured array with an int or iterate through the array, you
get back rows:

In [52]: order_array[3]

Out[52]: (2008, 1, 30, 'GOOG', 'Sell', 100)

With homogeneous arrays you can access the columns with order_array[:, i]
Now, with a structured array, you access them by name: e.g. order_array['year'].

Or, use Pandas:

If you can install Pandas, I think you might be happiest working with a Pandas DataFrame:

In [73]: df = pd.DataFrame(rows_list, columns=['date', 'symbol', 'action', 'value'])

In [75]: df.sort(['date'])

Out[75]: 

         date symbol action  value

0  2008-01-23   AAPL    Buy    100

2  2008-01-23   GOOG    Buy    100

1  2008-01-30   AAPL   Sell    100

3  2008-01-30   GOOG   Sell    100

6  2008-05-01    XOM    Buy    100

7  2008-05-08    XOM   Sell    100

4  2008-09-08   GOOG    Buy    100

5  2008-09-15   GOOG   Sell    100

Pandas has useful functions for aligning timeseries by dates, filling in missing
values, grouping and aggregating/transforming rows or columns.

Typically it is more useful to have a single date column instead of three integer-valued columns for the year, month, day.

If you need the year, month, day as separate columns for the purpose of outputing, to say csv, then you can replace the date column with year, month, day columns like this:

In [33]: df = df.join(df['date'].apply(lambda x: pd.Series([x.year, x.month, x.day], index=['year', 'month', 'day'])))



In [34]: del df['date']



In [35]: df

Out[35]: 

  symbol action  value  year  month  day

0   AAPL    Buy    100  2008      1   23

1   GOOG    Buy    100  2008      1   23

2   AAPL   Sell    100  2008      1   30

3   GOOG   Sell    100  2008      1   30

4    XOM    Buy    100  2008      5    1

5    XOM   Sell    100  2008      5    8

6   GOOG    Buy    100  2008      9    8

7   GOOG   Sell    100  2008      9   15

Or, if you have no use for the 'date' column to begin with, you can of course leave rows_list alone and build the DataFrame with the year, month, day columns from the beginning. Sorting is still easy:

df.sort(['year', 'month', 'day'])

edited Oct 4 '13 at 10:58

answered Oct 3 '13 at 10:19

unutbu

534k9811381208

thanks. but I am converting the strings to int. I have edited the question to include the code for creating order_array. look forward to your help
– user2842122
Oct 3 '13 at 10:42

@user2842122 - those 'ints' are being converted back to strings. unutbu - I think the simplest solution here might be to introduce a NumPy recarray, composed of a NumPy datetime object and your remaining string and integer data. There's a complete example here.
– Aron Ahmadia
Oct 3 '13 at 12:41

@AronAhmadia: Thank you for the comment! Yes, I was thinking of adding something like that, but I fear this answer is too long already, and Pandas is still probably the better way to go.
– unutbu
Oct 3 '13 at 12:44

When you've got a hammer in your hand everything looks like a nail :) I agree that as part of the SciPy stack, Pandas should be available and has a friendlier interface for this sort of work.
– Aron Ahmadia
Oct 3 '13 at 14:34

@unutbu: many thanks for the lucid explanation and the various solutions. Although the "rows_list.sort" solution seems the easiest for me to implement, i have taken your suggestion and used pandas- instead of numpy arrays- to resolve. I had one doubt though - for both the rows_list.sort and pandas solution you have first converted my three columns(yyyy,mm,dd) to one datetime column. Why is that? Is it so that the sorting can be done on just one column instead of three? Bcos it does create a problem since my final array has to have three columns (yyyy, mm, dd) instead of one.
– user2842122
Oct 4 '13 at 2:32

|
show 4 more comments

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f19156472%2fsorting-numpy-array-on-multiple-columns-in-python%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
9
down vote

tldr: NumPy shines when doing numerical calculations on numerical arrays. Although it is possible (see below) NumPy is not well suited for this. You're probably better off using Pandas.

The cause of the problem:

The values are being sorted as strings. You need to sort them as ints.

In [7]: sorted(['15', '8'])

Out[7]: ['15', '8']



In [8]: sorted([15, 8])

Out[8]: [8, 15]

This happened because order_array contains strings. You need to convert those strings to ints where appropriate.

Interestingly, even though you converted the values to ints, when you call

order_array = np.array(rows_list)

You can check the dtype for yourself by inspecting order_array.dtype:

In [42]: order_array = np.array(rows_list)



In [43]: order_array.dtype

Out[43]: dtype('|S4')

Now, how do we fix this?

Using an object dtype:

The simplest way is to use an 'object' dtype

In [53]: order_array = np.array(rows_list, dtype='object')



In [54]: order_array

Out[54]: 

array([[2008, 1, 23, AAPL, Buy, 100],

       [2008, 1, 30, AAPL, Sell, 100],

       [2008, 1, 23, GOOG, Buy, 100],

       [2008, 1, 30, GOOG, Sell, 100],

       [2008, 9, 8, GOOG, Buy, 100],

       [2008, 9, 15, GOOG, Sell, 100],

       [2008, 5, 1, XOM, Buy, 100],

       [2008, 5, 8, XOM, Sell, 100]], dtype=object)

The problem here is that np.lexsort or np.sort do not work on arrays of
dtype object. To get around that problem, you could sort the rows_list
before creating order_list:

In [59]: import operator



In [60]: rows_list.sort(key=operator.itemgetter(0,1,2))

Out[60]: 

[(2008, 1, 23, 'AAPL', 'Buy', 100),

 (2008, 1, 23, 'GOOG', 'Buy', 100),

 (2008, 1, 30, 'AAPL', 'Sell', 100),

 (2008, 1, 30, 'GOOG', 'Sell', 100),

 (2008, 5, 1, 'XOM', 'Buy', 100),

 (2008, 5, 8, 'XOM', 'Sell', 100),

 (2008, 9, 8, 'GOOG', 'Buy', 100),

 (2008, 9, 15, 'GOOG', 'Sell', 100)]



order_array = np.array(rows_list, dtype='object')

A better option would be to combine the first three columns into datetime.date objects:

import operator

import datetime as DT



for i in ...:

    seq = [DT.date(int(x.year), int(x.month), int(x.day)) ,s_sym, 'Buy', 100]   

    rows_list.append(seq)

rows_list.sort(key=operator.itemgetter(0,1,2))        

order_array = np.array(rows_list, dtype='object')



In [72]: order_array

Out[72]: 

array([[2008-01-23, AAPL, Buy, 100],

       [2008-01-30, AAPL, Sell, 100],

       [2008-01-23, GOOG, Buy, 100],

       [2008-01-30, GOOG, Sell, 100],

       [2008-09-08, GOOG, Buy, 100],

       [2008-09-15, GOOG, Sell, 100],

       [2008-05-01, XOM, Buy, 100],

       [2008-05-08, XOM, Sell, 100]], dtype=object)

Using a structured array:

dt = [('year', '<i4'), ('month', '<i4'), ('day', '<i4'), ('symbol', '|S8'),

      ('action', '|S4'), ('value', '<i4')]

order_array = np.array(rows_list, dtype=dt)



In [47]: order_array.dtype

Out[47]: dtype([('year', '<i4'), ('month', '<i4'), ('day', '<i4'), ('symbol', '|S8'), ('action', '|S4'), ('value', '<i4')])

To sort the structured array you could use the sort method:

order_array.sort(order=['year', 'month', 'day'])

To work with structured arrays, you'll need to know about some differences between homogenous and structured arrays:

Your original homogenous array was 2-dimensional. In contrast, all
structured arrays are 1-dimensional:

In [51]: order_array.shape

Out[51]: (8,)

If you index the structured array with an int or iterate through the array, you
get back rows:

In [52]: order_array[3]

Out[52]: (2008, 1, 30, 'GOOG', 'Sell', 100)

With homogeneous arrays you can access the columns with order_array[:, i]
Now, with a structured array, you access them by name: e.g. order_array['year'].

Or, use Pandas:

If you can install Pandas, I think you might be happiest working with a Pandas DataFrame:

In [73]: df = pd.DataFrame(rows_list, columns=['date', 'symbol', 'action', 'value'])

In [75]: df.sort(['date'])

Out[75]: 

         date symbol action  value

0  2008-01-23   AAPL    Buy    100

2  2008-01-23   GOOG    Buy    100

1  2008-01-30   AAPL   Sell    100

3  2008-01-30   GOOG   Sell    100

6  2008-05-01    XOM    Buy    100

7  2008-05-08    XOM   Sell    100

4  2008-09-08   GOOG    Buy    100

5  2008-09-15   GOOG   Sell    100

Pandas has useful functions for aligning timeseries by dates, filling in missing
values, grouping and aggregating/transforming rows or columns.

Typically it is more useful to have a single date column instead of three integer-valued columns for the year, month, day.

If you need the year, month, day as separate columns for the purpose of outputing, to say csv, then you can replace the date column with year, month, day columns like this:

In [33]: df = df.join(df['date'].apply(lambda x: pd.Series([x.year, x.month, x.day], index=['year', 'month', 'day'])))



In [34]: del df['date']



In [35]: df

Out[35]: 

  symbol action  value  year  month  day

0   AAPL    Buy    100  2008      1   23

1   GOOG    Buy    100  2008      1   23

2   AAPL   Sell    100  2008      1   30

3   GOOG   Sell    100  2008      1   30

4    XOM    Buy    100  2008      5    1

5    XOM   Sell    100  2008      5    8

6   GOOG    Buy    100  2008      9    8

7   GOOG   Sell    100  2008      9   15

df.sort(['year', 'month', 'day'])

edited Oct 4 '13 at 10:58

answered Oct 3 '13 at 10:19

unutbu

534k9811381208

thanks. but I am converting the strings to int. I have edited the question to include the code for creating order_array. look forward to your help
– user2842122
Oct 3 '13 at 10:42

@user2842122 - those 'ints' are being converted back to strings. unutbu - I think the simplest solution here might be to introduce a NumPy recarray, composed of a NumPy datetime object and your remaining string and integer data. There's a complete example here.
– Aron Ahmadia
Oct 3 '13 at 12:41

@AronAhmadia: Thank you for the comment! Yes, I was thinking of adding something like that, but I fear this answer is too long already, and Pandas is still probably the better way to go.
– unutbu
Oct 3 '13 at 12:44

When you've got a hammer in your hand everything looks like a nail :) I agree that as part of the SciPy stack, Pandas should be available and has a friendlier interface for this sort of work.
– Aron Ahmadia
Oct 3 '13 at 14:34

@unutbu: many thanks for the lucid explanation and the various solutions. Although the "rows_list.sort" solution seems the easiest for me to implement, i have taken your suggestion and used pandas- instead of numpy arrays- to resolve. I had one doubt though - for both the rows_list.sort and pandas solution you have first converted my three columns(yyyy,mm,dd) to one datetime column. Why is that? Is it so that the sorting can be done on just one column instead of three? Bcos it does create a problem since my final array has to have three columns (yyyy, mm, dd) instead of one.
– user2842122
Oct 4 '13 at 2:32

|
show 4 more comments

up vote
9
down vote

tldr: NumPy shines when doing numerical calculations on numerical arrays. Although it is possible (see below) NumPy is not well suited for this. You're probably better off using Pandas.

The cause of the problem:

The values are being sorted as strings. You need to sort them as ints.

In [7]: sorted(['15', '8'])

Out[7]: ['15', '8']



In [8]: sorted([15, 8])

Out[8]: [8, 15]

This happened because order_array contains strings. You need to convert those strings to ints where appropriate.

Interestingly, even though you converted the values to ints, when you call

order_array = np.array(rows_list)

You can check the dtype for yourself by inspecting order_array.dtype:

In [42]: order_array = np.array(rows_list)



In [43]: order_array.dtype

Out[43]: dtype('|S4')

Now, how do we fix this?

Using an object dtype:

The simplest way is to use an 'object' dtype

In [53]: order_array = np.array(rows_list, dtype='object')



In [54]: order_array

Out[54]: 

array([[2008, 1, 23, AAPL, Buy, 100],

       [2008, 1, 30, AAPL, Sell, 100],

       [2008, 1, 23, GOOG, Buy, 100],

       [2008, 1, 30, GOOG, Sell, 100],

       [2008, 9, 8, GOOG, Buy, 100],

       [2008, 9, 15, GOOG, Sell, 100],

       [2008, 5, 1, XOM, Buy, 100],

       [2008, 5, 8, XOM, Sell, 100]], dtype=object)

The problem here is that np.lexsort or np.sort do not work on arrays of
dtype object. To get around that problem, you could sort the rows_list
before creating order_list:

In [59]: import operator



In [60]: rows_list.sort(key=operator.itemgetter(0,1,2))

Out[60]: 

[(2008, 1, 23, 'AAPL', 'Buy', 100),

 (2008, 1, 23, 'GOOG', 'Buy', 100),

 (2008, 1, 30, 'AAPL', 'Sell', 100),

 (2008, 1, 30, 'GOOG', 'Sell', 100),

 (2008, 5, 1, 'XOM', 'Buy', 100),

 (2008, 5, 8, 'XOM', 'Sell', 100),

 (2008, 9, 8, 'GOOG', 'Buy', 100),

 (2008, 9, 15, 'GOOG', 'Sell', 100)]



order_array = np.array(rows_list, dtype='object')

A better option would be to combine the first three columns into datetime.date objects:

import operator

import datetime as DT



for i in ...:

    seq = [DT.date(int(x.year), int(x.month), int(x.day)) ,s_sym, 'Buy', 100]   

    rows_list.append(seq)

rows_list.sort(key=operator.itemgetter(0,1,2))        

order_array = np.array(rows_list, dtype='object')



In [72]: order_array

Out[72]: 

array([[2008-01-23, AAPL, Buy, 100],

       [2008-01-30, AAPL, Sell, 100],

       [2008-01-23, GOOG, Buy, 100],

       [2008-01-30, GOOG, Sell, 100],

       [2008-09-08, GOOG, Buy, 100],

       [2008-09-15, GOOG, Sell, 100],

       [2008-05-01, XOM, Buy, 100],

       [2008-05-08, XOM, Sell, 100]], dtype=object)

Using a structured array:

dt = [('year', '<i4'), ('month', '<i4'), ('day', '<i4'), ('symbol', '|S8'),

      ('action', '|S4'), ('value', '<i4')]

order_array = np.array(rows_list, dtype=dt)



In [47]: order_array.dtype

Out[47]: dtype([('year', '<i4'), ('month', '<i4'), ('day', '<i4'), ('symbol', '|S8'), ('action', '|S4'), ('value', '<i4')])

To sort the structured array you could use the sort method:

order_array.sort(order=['year', 'month', 'day'])

To work with structured arrays, you'll need to know about some differences between homogenous and structured arrays:

Your original homogenous array was 2-dimensional. In contrast, all
structured arrays are 1-dimensional:

In [51]: order_array.shape

Out[51]: (8,)

If you index the structured array with an int or iterate through the array, you
get back rows:

In [52]: order_array[3]

Out[52]: (2008, 1, 30, 'GOOG', 'Sell', 100)

With homogeneous arrays you can access the columns with order_array[:, i]
Now, with a structured array, you access them by name: e.g. order_array['year'].

Or, use Pandas:

If you can install Pandas, I think you might be happiest working with a Pandas DataFrame:

In [73]: df = pd.DataFrame(rows_list, columns=['date', 'symbol', 'action', 'value'])

In [75]: df.sort(['date'])

Out[75]: 

         date symbol action  value

0  2008-01-23   AAPL    Buy    100

2  2008-01-23   GOOG    Buy    100

1  2008-01-30   AAPL   Sell    100

3  2008-01-30   GOOG   Sell    100

6  2008-05-01    XOM    Buy    100

7  2008-05-08    XOM   Sell    100

4  2008-09-08   GOOG    Buy    100

5  2008-09-15   GOOG   Sell    100

Pandas has useful functions for aligning timeseries by dates, filling in missing
values, grouping and aggregating/transforming rows or columns.

Typically it is more useful to have a single date column instead of three integer-valued columns for the year, month, day.

If you need the year, month, day as separate columns for the purpose of outputing, to say csv, then you can replace the date column with year, month, day columns like this:

In [33]: df = df.join(df['date'].apply(lambda x: pd.Series([x.year, x.month, x.day], index=['year', 'month', 'day'])))



In [34]: del df['date']



In [35]: df

Out[35]: 

  symbol action  value  year  month  day

0   AAPL    Buy    100  2008      1   23

1   GOOG    Buy    100  2008      1   23

2   AAPL   Sell    100  2008      1   30

3   GOOG   Sell    100  2008      1   30

4    XOM    Buy    100  2008      5    1

5    XOM   Sell    100  2008      5    8

6   GOOG    Buy    100  2008      9    8

7   GOOG   Sell    100  2008      9   15

df.sort(['year', 'month', 'day'])

edited Oct 4 '13 at 10:58

answered Oct 3 '13 at 10:19

unutbu

534k9811381208

thanks. but I am converting the strings to int. I have edited the question to include the code for creating order_array. look forward to your help
– user2842122
Oct 3 '13 at 10:42

@user2842122 - those 'ints' are being converted back to strings. unutbu - I think the simplest solution here might be to introduce a NumPy recarray, composed of a NumPy datetime object and your remaining string and integer data. There's a complete example here.
– Aron Ahmadia
Oct 3 '13 at 12:41

@AronAhmadia: Thank you for the comment! Yes, I was thinking of adding something like that, but I fear this answer is too long already, and Pandas is still probably the better way to go.
– unutbu
Oct 3 '13 at 12:44

When you've got a hammer in your hand everything looks like a nail :) I agree that as part of the SciPy stack, Pandas should be available and has a friendlier interface for this sort of work.
– Aron Ahmadia
Oct 3 '13 at 14:34

@unutbu: many thanks for the lucid explanation and the various solutions. Although the "rows_list.sort" solution seems the easiest for me to implement, i have taken your suggestion and used pandas- instead of numpy arrays- to resolve. I had one doubt though - for both the rows_list.sort and pandas solution you have first converted my three columns(yyyy,mm,dd) to one datetime column. Why is that? Is it so that the sorting can be done on just one column instead of three? Bcos it does create a problem since my final array has to have three columns (yyyy, mm, dd) instead of one.
– user2842122
Oct 4 '13 at 2:32

|
show 4 more comments

up vote
9
down vote

tldr: NumPy shines when doing numerical calculations on numerical arrays. Although it is possible (see below) NumPy is not well suited for this. You're probably better off using Pandas.

The cause of the problem:

The values are being sorted as strings. You need to sort them as ints.

In [7]: sorted(['15', '8'])

Out[7]: ['15', '8']



In [8]: sorted([15, 8])

Out[8]: [8, 15]

This happened because order_array contains strings. You need to convert those strings to ints where appropriate.

Interestingly, even though you converted the values to ints, when you call

order_array = np.array(rows_list)

You can check the dtype for yourself by inspecting order_array.dtype:

In [42]: order_array = np.array(rows_list)



In [43]: order_array.dtype

Out[43]: dtype('|S4')

Now, how do we fix this?

Using an object dtype:

The simplest way is to use an 'object' dtype

In [53]: order_array = np.array(rows_list, dtype='object')



In [54]: order_array

Out[54]: 

array([[2008, 1, 23, AAPL, Buy, 100],

       [2008, 1, 30, AAPL, Sell, 100],

       [2008, 1, 23, GOOG, Buy, 100],

       [2008, 1, 30, GOOG, Sell, 100],

       [2008, 9, 8, GOOG, Buy, 100],

       [2008, 9, 15, GOOG, Sell, 100],

       [2008, 5, 1, XOM, Buy, 100],

       [2008, 5, 8, XOM, Sell, 100]], dtype=object)

The problem here is that np.lexsort or np.sort do not work on arrays of
dtype object. To get around that problem, you could sort the rows_list
before creating order_list:

In [59]: import operator



In [60]: rows_list.sort(key=operator.itemgetter(0,1,2))

Out[60]: 

[(2008, 1, 23, 'AAPL', 'Buy', 100),

 (2008, 1, 23, 'GOOG', 'Buy', 100),

 (2008, 1, 30, 'AAPL', 'Sell', 100),

 (2008, 1, 30, 'GOOG', 'Sell', 100),

 (2008, 5, 1, 'XOM', 'Buy', 100),

 (2008, 5, 8, 'XOM', 'Sell', 100),

 (2008, 9, 8, 'GOOG', 'Buy', 100),

 (2008, 9, 15, 'GOOG', 'Sell', 100)]



order_array = np.array(rows_list, dtype='object')

A better option would be to combine the first three columns into datetime.date objects:

import operator

import datetime as DT



for i in ...:

    seq = [DT.date(int(x.year), int(x.month), int(x.day)) ,s_sym, 'Buy', 100]   

    rows_list.append(seq)

rows_list.sort(key=operator.itemgetter(0,1,2))        

order_array = np.array(rows_list, dtype='object')



In [72]: order_array

Out[72]: 

array([[2008-01-23, AAPL, Buy, 100],

       [2008-01-30, AAPL, Sell, 100],

       [2008-01-23, GOOG, Buy, 100],

       [2008-01-30, GOOG, Sell, 100],

       [2008-09-08, GOOG, Buy, 100],

       [2008-09-15, GOOG, Sell, 100],

       [2008-05-01, XOM, Buy, 100],

       [2008-05-08, XOM, Sell, 100]], dtype=object)

Using a structured array:

dt = [('year', '<i4'), ('month', '<i4'), ('day', '<i4'), ('symbol', '|S8'),

      ('action', '|S4'), ('value', '<i4')]

order_array = np.array(rows_list, dtype=dt)



In [47]: order_array.dtype

Out[47]: dtype([('year', '<i4'), ('month', '<i4'), ('day', '<i4'), ('symbol', '|S8'), ('action', '|S4'), ('value', '<i4')])

To sort the structured array you could use the sort method:

order_array.sort(order=['year', 'month', 'day'])

To work with structured arrays, you'll need to know about some differences between homogenous and structured arrays:

Your original homogenous array was 2-dimensional. In contrast, all
structured arrays are 1-dimensional:

In [51]: order_array.shape

Out[51]: (8,)

If you index the structured array with an int or iterate through the array, you
get back rows:

In [52]: order_array[3]

Out[52]: (2008, 1, 30, 'GOOG', 'Sell', 100)

With homogeneous arrays you can access the columns with order_array[:, i]
Now, with a structured array, you access them by name: e.g. order_array['year'].

Or, use Pandas:

If you can install Pandas, I think you might be happiest working with a Pandas DataFrame:

In [73]: df = pd.DataFrame(rows_list, columns=['date', 'symbol', 'action', 'value'])

In [75]: df.sort(['date'])

Out[75]: 

         date symbol action  value

0  2008-01-23   AAPL    Buy    100

2  2008-01-23   GOOG    Buy    100

1  2008-01-30   AAPL   Sell    100

3  2008-01-30   GOOG   Sell    100

6  2008-05-01    XOM    Buy    100

7  2008-05-08    XOM   Sell    100

4  2008-09-08   GOOG    Buy    100

5  2008-09-15   GOOG   Sell    100

Pandas has useful functions for aligning timeseries by dates, filling in missing
values, grouping and aggregating/transforming rows or columns.

Typically it is more useful to have a single date column instead of three integer-valued columns for the year, month, day.

If you need the year, month, day as separate columns for the purpose of outputing, to say csv, then you can replace the date column with year, month, day columns like this:

In [33]: df = df.join(df['date'].apply(lambda x: pd.Series([x.year, x.month, x.day], index=['year', 'month', 'day'])))



In [34]: del df['date']



In [35]: df

Out[35]: 

  symbol action  value  year  month  day

0   AAPL    Buy    100  2008      1   23

1   GOOG    Buy    100  2008      1   23

2   AAPL   Sell    100  2008      1   30

3   GOOG   Sell    100  2008      1   30

4    XOM    Buy    100  2008      5    1

5    XOM   Sell    100  2008      5    8

6   GOOG    Buy    100  2008      9    8

7   GOOG   Sell    100  2008      9   15

df.sort(['year', 'month', 'day'])

edited Oct 4 '13 at 10:58

answered Oct 3 '13 at 10:19

unutbu

534k9811381208

tldr: NumPy shines when doing numerical calculations on numerical arrays. Although it is possible (see below) NumPy is not well suited for this. You're probably better off using Pandas.

The cause of the problem:

The values are being sorted as strings. You need to sort them as ints.

In [7]: sorted(['15', '8'])

Out[7]: ['15', '8']



In [8]: sorted([15, 8])

Out[8]: [8, 15]

This happened because order_array contains strings. You need to convert those strings to ints where appropriate.

Interestingly, even though you converted the values to ints, when you call

order_array = np.array(rows_list)

You can check the dtype for yourself by inspecting order_array.dtype:

In [42]: order_array = np.array(rows_list)



In [43]: order_array.dtype

Out[43]: dtype('|S4')

Now, how do we fix this?

Using an object dtype:

The simplest way is to use an 'object' dtype

In [53]: order_array = np.array(rows_list, dtype='object')



In [54]: order_array

Out[54]: 

array([[2008, 1, 23, AAPL, Buy, 100],

       [2008, 1, 30, AAPL, Sell, 100],

       [2008, 1, 23, GOOG, Buy, 100],

       [2008, 1, 30, GOOG, Sell, 100],

       [2008, 9, 8, GOOG, Buy, 100],

       [2008, 9, 15, GOOG, Sell, 100],

       [2008, 5, 1, XOM, Buy, 100],

       [2008, 5, 8, XOM, Sell, 100]], dtype=object)

The problem here is that np.lexsort or np.sort do not work on arrays of
dtype object. To get around that problem, you could sort the rows_list
before creating order_list:

In [59]: import operator



In [60]: rows_list.sort(key=operator.itemgetter(0,1,2))

Out[60]: 

[(2008, 1, 23, 'AAPL', 'Buy', 100),

 (2008, 1, 23, 'GOOG', 'Buy', 100),

 (2008, 1, 30, 'AAPL', 'Sell', 100),

 (2008, 1, 30, 'GOOG', 'Sell', 100),

 (2008, 5, 1, 'XOM', 'Buy', 100),

 (2008, 5, 8, 'XOM', 'Sell', 100),

 (2008, 9, 8, 'GOOG', 'Buy', 100),

 (2008, 9, 15, 'GOOG', 'Sell', 100)]



order_array = np.array(rows_list, dtype='object')

A better option would be to combine the first three columns into datetime.date objects:

import operator

import datetime as DT



for i in ...:

    seq = [DT.date(int(x.year), int(x.month), int(x.day)) ,s_sym, 'Buy', 100]   

    rows_list.append(seq)

rows_list.sort(key=operator.itemgetter(0,1,2))        

order_array = np.array(rows_list, dtype='object')



In [72]: order_array

Out[72]: 

array([[2008-01-23, AAPL, Buy, 100],

       [2008-01-30, AAPL, Sell, 100],

       [2008-01-23, GOOG, Buy, 100],

       [2008-01-30, GOOG, Sell, 100],

       [2008-09-08, GOOG, Buy, 100],

       [2008-09-15, GOOG, Sell, 100],

       [2008-05-01, XOM, Buy, 100],

       [2008-05-08, XOM, Sell, 100]], dtype=object)

Using a structured array:

dt = [('year', '<i4'), ('month', '<i4'), ('day', '<i4'), ('symbol', '|S8'),

      ('action', '|S4'), ('value', '<i4')]

order_array = np.array(rows_list, dtype=dt)



In [47]: order_array.dtype

Out[47]: dtype([('year', '<i4'), ('month', '<i4'), ('day', '<i4'), ('symbol', '|S8'), ('action', '|S4'), ('value', '<i4')])

To sort the structured array you could use the sort method:

order_array.sort(order=['year', 'month', 'day'])

To work with structured arrays, you'll need to know about some differences between homogenous and structured arrays:

Your original homogenous array was 2-dimensional. In contrast, all
structured arrays are 1-dimensional:

In [51]: order_array.shape

Out[51]: (8,)

If you index the structured array with an int or iterate through the array, you
get back rows:

In [52]: order_array[3]

Out[52]: (2008, 1, 30, 'GOOG', 'Sell', 100)

With homogeneous arrays you can access the columns with order_array[:, i]
Now, with a structured array, you access them by name: e.g. order_array['year'].

Or, use Pandas:

If you can install Pandas, I think you might be happiest working with a Pandas DataFrame:

In [73]: df = pd.DataFrame(rows_list, columns=['date', 'symbol', 'action', 'value'])

In [75]: df.sort(['date'])

Out[75]: 

         date symbol action  value

0  2008-01-23   AAPL    Buy    100

2  2008-01-23   GOOG    Buy    100

1  2008-01-30   AAPL   Sell    100

3  2008-01-30   GOOG   Sell    100

6  2008-05-01    XOM    Buy    100

7  2008-05-08    XOM   Sell    100

4  2008-09-08   GOOG    Buy    100

5  2008-09-15   GOOG   Sell    100

Pandas has useful functions for aligning timeseries by dates, filling in missing
values, grouping and aggregating/transforming rows or columns.

Typically it is more useful to have a single date column instead of three integer-valued columns for the year, month, day.

If you need the year, month, day as separate columns for the purpose of outputing, to say csv, then you can replace the date column with year, month, day columns like this:

In [33]: df = df.join(df['date'].apply(lambda x: pd.Series([x.year, x.month, x.day], index=['year', 'month', 'day'])))



In [34]: del df['date']



In [35]: df

Out[35]: 

  symbol action  value  year  month  day

0   AAPL    Buy    100  2008      1   23

1   GOOG    Buy    100  2008      1   23

2   AAPL   Sell    100  2008      1   30

3   GOOG   Sell    100  2008      1   30

4    XOM    Buy    100  2008      5    1

5    XOM   Sell    100  2008      5    8

6   GOOG    Buy    100  2008      9    8

7   GOOG   Sell    100  2008      9   15

df.sort(['year', 'month', 'day'])

edited Oct 4 '13 at 10:58

answered Oct 3 '13 at 10:19

unutbu

534k9811381208

edited Oct 4 '13 at 10:58

answered Oct 3 '13 at 10:19

unutbu

534k9811381208

answered Oct 3 '13 at 10:19

unutbu

534k9811381208

answered Oct 3 '13 at 10:19

unutbu

534k9811381208

thanks. but I am converting the strings to int. I have edited the question to include the code for creating order_array. look forward to your help
– user2842122
Oct 3 '13 at 10:42

@user2842122 - those 'ints' are being converted back to strings. unutbu - I think the simplest solution here might be to introduce a NumPy recarray, composed of a NumPy datetime object and your remaining string and integer data. There's a complete example here.
– Aron Ahmadia
Oct 3 '13 at 12:41

@AronAhmadia: Thank you for the comment! Yes, I was thinking of adding something like that, but I fear this answer is too long already, and Pandas is still probably the better way to go.
– unutbu
Oct 3 '13 at 12:44

When you've got a hammer in your hand everything looks like a nail :) I agree that as part of the SciPy stack, Pandas should be available and has a friendlier interface for this sort of work.
– Aron Ahmadia
Oct 3 '13 at 14:34

@unutbu: many thanks for the lucid explanation and the various solutions. Although the "rows_list.sort" solution seems the easiest for me to implement, i have taken your suggestion and used pandas- instead of numpy arrays- to resolve. I had one doubt though - for both the rows_list.sort and pandas solution you have first converted my three columns(yyyy,mm,dd) to one datetime column. Why is that? Is it so that the sorting can be done on just one column instead of three? Bcos it does create a problem since my final array has to have three columns (yyyy, mm, dd) instead of one.
– user2842122
Oct 4 '13 at 2:32

|
show 4 more comments

thanks. but I am converting the strings to int. I have edited the question to include the code for creating order_array. look forward to your help
– user2842122
Oct 3 '13 at 10:42

@user2842122 - those 'ints' are being converted back to strings. unutbu - I think the simplest solution here might be to introduce a NumPy recarray, composed of a NumPy datetime object and your remaining string and integer data. There's a complete example here.
– Aron Ahmadia
Oct 3 '13 at 12:41

@AronAhmadia: Thank you for the comment! Yes, I was thinking of adding something like that, but I fear this answer is too long already, and Pandas is still probably the better way to go.
– unutbu
Oct 3 '13 at 12:44

When you've got a hammer in your hand everything looks like a nail :) I agree that as part of the SciPy stack, Pandas should be available and has a friendlier interface for this sort of work.
– Aron Ahmadia
Oct 3 '13 at 14:34

@unutbu: many thanks for the lucid explanation and the various solutions. Although the "rows_list.sort" solution seems the easiest for me to implement, i have taken your suggestion and used pandas- instead of numpy arrays- to resolve. I had one doubt though - for both the rows_list.sort and pandas solution you have first converted my three columns(yyyy,mm,dd) to one datetime column. Why is that? Is it so that the sorting can be done on just one column instead of three? Bcos it does create a problem since my final array has to have three columns (yyyy, mm, dd) instead of one.
– user2842122
Oct 4 '13 at 2:32

thanks. but I am converting the strings to int. I have edited the question to include the code for creating order_array. look forward to your help
– user2842122
Oct 3 '13 at 10:42

@user2842122 - those 'ints' are being converted back to strings. unutbu - I think the simplest solution here might be to introduce a NumPy recarray, composed of a NumPy datetime object and your remaining string and integer data. There's a complete example here.
– Aron Ahmadia
Oct 3 '13 at 12:41

@AronAhmadia: Thank you for the comment! Yes, I was thinking of adding something like that, but I fear this answer is too long already, and Pandas is still probably the better way to go.
– unutbu
Oct 3 '13 at 12:44

When you've got a hammer in your hand everything looks like a nail :) I agree that as part of the SciPy stack, Pandas should be available and has a friendlier interface for this sort of work.
– Aron Ahmadia
Oct 3 '13 at 14:34

@unutbu: many thanks for the lucid explanation and the various solutions. Although the "rows_list.sort" solution seems the easiest for me to implement, i have taken your suggestion and used pandas- instead of numpy arrays- to resolve. I had one doubt though - for both the rows_list.sort and pandas solution you have first converted my three columns(yyyy,mm,dd) to one datetime column. Why is that? Is it so that the sorting can be done on just one column instead of three? Bcos it does create a problem since my final array has to have three columns (yyyy, mm, dd) instead of one.
– user2842122
Oct 4 '13 at 2:32

|
show 4 more comments

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Wsrtjtyk