Pandas DataFrame groupBy and then COUNT DISTINCT

Code description

This code snippet shows you how to group a pandas DataFrame via certain column and then do a distinct count of unique values in another column.

It is similar as COUNT DISTINCT aggregation functions in SQL. It also sort the values by the group by column.

Example output:

       category  value  user  group-count
    0         A      0     5            7
    80        A     80     4            7
    70        A     70     7            7
    60        A     60    10            7
    50        A     50     9            7
    ..      ...    ...   ...          ...
    29        J     29     9            7
    19        J     19     9            7
    9         J      9     9            7
    89        J     89     8            7
    99        J     99     7            7
    
    [100 rows x 4 columns]

Code snippet

    import pandas as pd
    import random
    
    categories = []
    users = []
    values = []
    for i in range(0, 100):
        categories.append(chr(i % 10+65))
        values.append(i)
        users.append(random.randint(1, 10))
    
    df = pd.DataFrame({'category': categories, 'value': values, 'user': users})
    print(df)
    
    df['group-count'] = df.groupby(by='category')['user'].transform('nunique')
    
    print(df.sort_values(by=['category']))

Code description

Code snippet

In this article