Given numpy arrays *a* and *b*, it is fairly straightforward find the indices of array *a* whose elements overlap with the elements of array *b* using the function numpy.in1d(). However, what if you want to also find the indices associated with elements in *b* that overlap with *a*? I’m unable to find a good solution to this, so I wrote a small function to do this. It does brute force matching after getting all the common elements of array *a* and *b*. If anyone knows of a better way, please let me know. The code snippet is below.

import numpy as np def overlap(a, b): # return the indices in a that overlap with b, also returns # the corresponding index in b only works if both a and b are unique! # This is not very efficient but it works bool_a = np.in1d(a,b) ind_a = np.arange(len(a)) ind_a = ind_a[bool_a] ind_b = np.array([np.argwhere(b == a[x]) for x in ind_a]).flatten() return ind_a,ind_b

Usage:

import overlap a = np.array([1,2,4,5]) b = np.array([4,6,10,9,1]) ind_a, ind_b = overlap.overlap(a,b)

**UPDATE (2016-09-28):** Based on comments from Mike (see below), there is different way to do this to account for non-unique elements. This also likely runs faster too. His suggestion is below:

def overlap_mbk(a, b): a1=np.argsort(a) b1=np.argsort(b) # use searchsorted: sort_left_a=a[a1].searchsorted(b[b1], side='left') sort_right_a=a[a1].searchsorted(b[b1], side='right') # sort_left_b=b[b1].searchsorted(a[a1], side='left') sort_right_b=b[b1].searchsorted(a[a1], side='right') # # which values are in b but not in a? # inds_b=(sort_right_a-sort_left_a == 0).nonzero()[0] # # which values are in b but not in a? # inds_a=(sort_right_b-sort_left_b == 0).nonzero()[0] # which values of b are also in a? inds_b=(sort_right_a-sort_left_a > 0).nonzero()[0] # which values of a are also in b? inds_a=(sort_right_b-sort_left_b > 0).nonzero()[0] return a1[inds_a], b1[inds_b]

This isn’t rigorously tested, but using numpy’s searchsorted method can do this quickly for non-unique inputs, too. Here’s some some sample code (along with commented out code that will do the same thing for fining elements not in common) that runs ~3x faster for the example you gave. Hope it’s useful.

%timeit -n 10000 overlap(a,b)

10000 loops, best of 3: 28.3 µs per loop

%timeit -n 10000 overlap_mbk(a,b)

10000 loops, best of 3: 8.68 µs per loop

def overlap_mbk(a, b):

a1=np.argsort(a)

b1=np.argsort(b)

# use searchsorted:

sort_left_a=a[a1].searchsorted(b[b1], side=’left’)

sort_right_a=a[a1].searchsorted(b[b1], side=’right’)

#

sort_left_b=b[b1].searchsorted(a[a1], side=’left’)

sort_right_b=b[b1].searchsorted(a[a1], side=’right’)

# # which values are in b but not in a?

# inds_b=(sort_right_a-sort_left_a == 0).nonzero()[0]

# # which values are in b but not in a?

# inds_a=(sort_right_b-sort_left_b == 0).nonzero()[0]

# which values of b are also in a?

inds_b=(sort_right_a-sort_left_a > 0).nonzero()[0]

# which values of a are also in b?

inds_a=(sort_right_b-sort_left_b > 0).nonzero()[0]

return a1[inds_a], b1[inds_b]

Thanks Mike! It’s great this works faster and for non-unique arrays. For my use case, I needed the elements to match one-to-one, so non-unique values are a problem.