How to find indices of the intersection of two numpy arrays

Given numpy arrays a and b, it is fairly straightforward find the indices of array a whose elements overlap with the elements of array b using the function numpy.in1d(). However, what if you want to also find the indices associated with elements in b that overlap with a? I’m unable to find a good solution to this, so I wrote a small function to do this. It does brute force matching after getting all the common elements of array a and b. If anyone knows of a better way, please let me know. The code snippet is below.

import numpy as np

def overlap(a, b):
    # return the indices in a that overlap with b, also returns 
    # the corresponding index in b only works if both a and b are unique! 
    # This is not very efficient but it works
    bool_a = np.in1d(a,b)
    ind_a = np.arange(len(a))
    ind_a = ind_a[bool_a]

    ind_b = np.array([np.argwhere(b == a[x]) for x in ind_a]).flatten()
    return ind_a,ind_b

Usage:

import overlap

a = np.array([1,2,4,5])
b = np.array([4,6,10,9,1])

ind_a, ind_b = overlap.overlap(a,b)

UPDATE (2016-09-28): Based on comments from Mike (see below), there is different way to do this to account for non-unique elements. This also likely runs faster too. His suggestion is below:

def overlap_mbk(a, b):
    a1=np.argsort(a)
    b1=np.argsort(b)
    # use searchsorted:
    sort_left_a=a[a1].searchsorted(b[b1], side='left')
    sort_right_a=a[a1].searchsorted(b[b1], side='right')
    #
    sort_left_b=b[b1].searchsorted(a[a1], side='left')
    sort_right_b=b[b1].searchsorted(a[a1], side='right')


    # # which values are in b but not in a?
    # inds_b=(sort_right_a-sort_left_a == 0).nonzero()[0]
    # # which values are in b but not in a?
    # inds_a=(sort_right_b-sort_left_b == 0).nonzero()[0]

    # which values of b are also in a?
    inds_b=(sort_right_a-sort_left_a > 0).nonzero()[0]
    # which values of a are also in b?
    inds_a=(sort_right_b-sort_left_b > 0).nonzero()[0]

    return a1[inds_a], b1[inds_b]

2 thoughts on “How to find indices of the intersection of two numpy arrays”

  1. This isn’t rigorously tested, but using numpy’s searchsorted method can do this quickly for non-unique inputs, too. Here’s some some sample code (along with commented out code that will do the same thing for fining elements not in common) that runs ~3x faster for the example you gave. Hope it’s useful.

    %timeit -n 10000 overlap(a,b)
    10000 loops, best of 3: 28.3 µs per loop

    %timeit -n 10000 overlap_mbk(a,b)
    10000 loops, best of 3: 8.68 µs per loop

    def overlap_mbk(a, b):
    a1=np.argsort(a)
    b1=np.argsort(b)
    # use searchsorted:
    sort_left_a=a[a1].searchsorted(b[b1], side=’left’)
    sort_right_a=a[a1].searchsorted(b[b1], side=’right’)
    #
    sort_left_b=b[b1].searchsorted(a[a1], side=’left’)
    sort_right_b=b[b1].searchsorted(a[a1], side=’right’)

    # # which values are in b but not in a?
    # inds_b=(sort_right_a-sort_left_a == 0).nonzero()[0]
    # # which values are in b but not in a?
    # inds_a=(sort_right_b-sort_left_b == 0).nonzero()[0]

    # which values of b are also in a?
    inds_b=(sort_right_a-sort_left_a > 0).nonzero()[0]
    # which values of a are also in b?
    inds_a=(sort_right_b-sort_left_b > 0).nonzero()[0]

    return a1[inds_a], b1[inds_b]

    1. Thanks Mike! It’s great this works faster and for non-unique arrays. For my use case, I needed the elements to match one-to-one, so non-unique values are a problem.

Leave a Reply

Your email address will not be published. Required fields are marked *