There are times when working with different pandas dataframes that you might need to get the data that is ‘different’ between the two dataframes (i.e.,g Comparing two pandas dataframes and getting the differences). This seems like a straightforward issue, but apparently its still a popular ‘question’ for many people and is my most popular question on stackoverflow.

As an example, let’s look at two pandas dataframes.Both have date indexes and the same structure. How can we compare these two dataframes and find which rows are in dataframe 2 that aren’t in dataframe 1?

dataframe 1 (named df1):

Date FruitNumColor 2013-11-24 Banana 22.1 Yellow 2013-11-24 Orange8.6 Orange 2013-11-24 Apple 7.6 Green 2013-11-24 Celery 10.2 Green

dataframe 2 (named df2):

Date FruitNumColor 2013-11-24 Banana 22.1 Yellow 2013-11-24 Orange8.6 Orange 2013-11-24 Apple 7.6 Green 2013-11-24 Celery 10.2 Green 2013-11-25 Apple22.1 Red 2013-11-25 Orange8.6 Orange

The answer, it seems, is quite simple but I couldn’t figure it out at the time. Thanks to the generosity of stackoverflow users, the answer (or at least an answer that works) is simply to concat the dataframes then perform a group-by via columns and finally re-index to get the unique records based on the index.

Here’s the code ( as provided by user alko on stackoverlow ):

df = pd.concat([df1, df2]) # concat dataframes df = df.reset_index(drop=True) # reset the index df_gpby = df.groupby(list(df.columns)) #group by idx = [x[0] for x in df_gpby.groups.values() if len(x) == 1] #reindex

This simple approach leads to the correct answer:

Date Fruit Num Color 92013-11-25Orange 8.6Orange 82013-11-25 Apple22.1 Red

There are most likely more ‘pythonic’ answers (one suggestion is here ) and I’d recommend you dig into those other approaches, but the above works, is easy to read and is fast enough for my needs.

Want more information about pandas for data analysis? Check out the book Python for Data Analysis by the creator of pandas, Wes McKinney.

Author: Eric Brown

Eric D. Brown , D.Sc. has a doctorate in Information Systems with a specialization in Data Sciences, Decision Support and Knowledge Management. He writes about utilizing python for data analytics at pythondata.com and the crossroads of technology and strategy at ericbrown.com View all posts by Eric Brown

本文开发(python)相关术语:python基础教程 python多线程 web开发工程师 软件开发工程师 软件开发流程

代码区博客精选文章
分页:12
转载请注明
本文标题:Python Data: Quick Tip: Comparing two pandas dataframes and getting the differen ...
本站链接:https://www.codesec.net/view/628469.html


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 开发(python) | 评论(0) | 阅读(186)