Given numpy arrays a and b, it is fairly straightforward find the indices of array a whose elements overlap with the elements of array b using the function numpy.in1d(). However, what if you want to also find the indices associated with elements in b that overlap with a? I’m unable to find a good solution to this, so I wrote a small function to do this. It does brute force matching after getting all the common elements of array a and b. If anyone knows of a better way, please let me know. The code snippet is below.
import numpy as np def overlap(a, b): # return the indices in a that overlap with b, also returns # the corresponding index in b only works if both a and b are unique! # This is not very efficient but it works bool_a = np.in1d(a,b) ind_a = np.arange(len(a)) ind_a = ind_a[bool_a] ind_b = np.array([np.argwhere(b == a[x]) for x in ind_a]).flatten() return ind_a,ind_b
Usage:
import overlap a = np.array([1,2,4,5]) b = np.array([4,6,10,9,1]) ind_a, ind_b = overlap.overlap(a,b)
UPDATE (2016-09-28): Based on comments from Mike (see below), there is different way to do this to account for non-unique elements. This also likely runs faster too. His suggestion is below:
def overlap_mbk(a, b): a1=np.argsort(a) b1=np.argsort(b) # use searchsorted: sort_left_a=a[a1].searchsorted(b[b1], side='left') sort_right_a=a[a1].searchsorted(b[b1], side='right') # sort_left_b=b[b1].searchsorted(a[a1], side='left') sort_right_b=b[b1].searchsorted(a[a1], side='right') # # which values are in b but not in a? # inds_b=(sort_right_a-sort_left_a == 0).nonzero()[0] # # which values are in b but not in a? # inds_a=(sort_right_b-sort_left_b == 0).nonzero()[0] # which values of b are also in a? inds_b=(sort_right_a-sort_left_a > 0).nonzero()[0] # which values of a are also in b? inds_a=(sort_right_b-sort_left_b > 0).nonzero()[0] return a1[inds_a], b1[inds_b]