Pandas - Create Multiindex columns during rename

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP

Pandas - Create Multiindex columns during rename



I'm trying to find a simple way to rename a flat column index to a hierarchical multindex column set. I've come across one way, but it seems a bit kludgy - is there a better way to do this in Pandas?


#!/usr/bin/env python
import pandas as pd
import numpy as np

flat_df = pd.DataFrame(np.random.randint(0,100,size=(4, 4)), columns=list('ACBD'))

print flat_df

# A C B D
# 0 27 67 35 36
# 1 80 42 93 20
# 2 64 9 18 83
# 3 85 69 60 84


nested_columns = 'A': ('One', 'a'),
'C': ('One', 'c'),
'B': ('Two', 'b'),
'D': ('Two', 'd'),


tuples = sorted(nested_columns.values(), key=lambda x: x[1]) # Sort by second value
nested_df = flat_df.sort_index(axis=1) # Sort dataframe by column name
nested_df.columns = pd.MultiIndex.from_tuples(tuples)
nested_df = nested_df.sort_index(level=0, axis=1) # Sort to group first level

print nested_df

# One Two
# a c b d
# 0 27 67 35 36
# 1 80 42 93 20
# 2 64 9 18 83
# 3 85 69 60 84



It seems a bit fragile to sort both the hierarchical column specification as well as the dataframe and assume they'll line up. Also sorting three times seems ridiculous. The alternative I'd prefer would be something like nested_df = flat_df.rename(columns=nested_columns), but it seems that rename isn't able to go from flat column indexing to multiindex columns. Am I missing something?


nested_df = flat_df.rename(columns=nested_columns)


rename



Edit: Realized this would break if the tuples sorted by second value don't sort the same as the flat column names. Definitely the wrong approach.



Edit2:
In response to @wen's answer:


nested_df = flat_df.rename(columns=nested_columns)
print nested_df
# (One, a) (One, c) (Two, b) (Two, d)
# 0 18 0 51 48
# 1 69 68 78 24
# 2 2 20 99 46
# 3 1 80 11 11




2 Answers
2



You could try:


df.columns = pd.MultiIndex.from_tuples(df.rename(columns = nested_columns).columns)
df



Output:


One Two
a c b d
0 27 67 35 36
1 80 42 93 20
2 64 9 18 83
3 85 69 60 84



IIUC, rename


rename


flat_df.rename(columns=nested_columns)
Out[224]:
One Two
a c b d
0 36 19 53 46
1 17 85 63 36
2 40 80 75 86
3 31 83 75 16



Updated


flat_df.columns.map(nested_columns.get)
Out[15]:
MultiIndex(levels=[['One', 'Two'], ['a', 'b', 'c', 'd']],
labels=[[0, 0, 1, 1], [0, 2, 1, 3]])





Hm... I am getting tuples as headers instead of pd.MultiIndex when I try this.
– Scott Boston
12 mins ago





@ScottBoston umm what version you have for pandas
– Wen
11 mins ago





pd.__version__ "0.23.3"
– Scott Boston
10 mins ago





@ScottBoston I have pd.__version__ Out[7]: '0.22.0'
– Wen
9 mins ago





@Wen First thing I tried. Sadly, didn't work with Pandas 0.23.3
– Nick Sweet
4 mins ago






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Comments

Popular posts from this blog

Executable numpy error

PySpark count values by condition

Mass disable jenkins jobs