Pandas - Create Multiindex columns during rename
Pandas - Create Multiindex columns during rename
I'm trying to find a simple way to rename a flat column index to a hierarchical multindex column set. I've come across one way, but it seems a bit kludgy - is there a better way to do this in Pandas?
#!/usr/bin/env python
import pandas as pd
import numpy as np
flat_df = pd.DataFrame(np.random.randint(0,100,size=(4, 4)), columns=list('ACBD'))
print flat_df
# A C B D
# 0 27 67 35 36
# 1 80 42 93 20
# 2 64 9 18 83
# 3 85 69 60 84
nested_columns = 'A': ('One', 'a'),
'C': ('One', 'c'),
'B': ('Two', 'b'),
'D': ('Two', 'd'),
tuples = sorted(nested_columns.values(), key=lambda x: x[1]) # Sort by second value
nested_df = flat_df.sort_index(axis=1) # Sort dataframe by column name
nested_df.columns = pd.MultiIndex.from_tuples(tuples)
nested_df = nested_df.sort_index(level=0, axis=1) # Sort to group first level
print nested_df
# One Two
# a c b d
# 0 27 67 35 36
# 1 80 42 93 20
# 2 64 9 18 83
# 3 85 69 60 84
It seems a bit fragile to sort both the hierarchical column specification as well as the dataframe and assume they'll line up. Also sorting three times seems ridiculous. The alternative I'd prefer would be something like nested_df = flat_df.rename(columns=nested_columns)
, but it seems that rename
isn't able to go from flat column indexing to multiindex columns. Am I missing something?
nested_df = flat_df.rename(columns=nested_columns)
rename
Edit: Realized this would break if the tuples sorted by second value don't sort the same as the flat column names. Definitely the wrong approach.
Edit2:
In response to @wen's answer:
nested_df = flat_df.rename(columns=nested_columns)
print nested_df
# (One, a) (One, c) (Two, b) (Two, d)
# 0 18 0 51 48
# 1 69 68 78 24
# 2 2 20 99 46
# 3 1 80 11 11
2 Answers
2
You could try:
df.columns = pd.MultiIndex.from_tuples(df.rename(columns = nested_columns).columns)
df
Output:
One Two
a c b d
0 27 67 35 36
1 80 42 93 20
2 64 9 18 83
3 85 69 60 84
IIUC, rename
rename
flat_df.rename(columns=nested_columns)
Out[224]:
One Two
a c b d
0 36 19 53 46
1 17 85 63 36
2 40 80 75 86
3 31 83 75 16
Updated
flat_df.columns.map(nested_columns.get)
Out[15]:
MultiIndex(levels=[['One', 'Two'], ['a', 'b', 'c', 'd']],
labels=[[0, 0, 1, 1], [0, 2, 1, 3]])
@ScottBoston umm what version you have for pandas
– Wen
11 mins ago
pd.__version__ "0.23.3"
– Scott Boston
10 mins ago
@ScottBoston I have pd.__version__ Out[7]: '0.22.0'
– Wen
9 mins ago
@Wen First thing I tried. Sadly, didn't work with Pandas 0.23.3
– Nick Sweet
4 mins ago
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Hm... I am getting tuples as headers instead of pd.MultiIndex when I try this.
– Scott Boston
12 mins ago