How to find indices of the intersection of two numpy arrays

Given numpy arrays a and b, it is fairly straightforward find the indices of array a whose elements overlap with the elements of array b using the function numpy.in1d(). However, what if you want to also find the indices associated with elements in b that overlap with a? I’m unable to find a good solution to this, so I wrote a small function to do this. It does brute force matching after getting all the common elements of array a and b. If anyone knows of a better way, please let me know. The code snippet is below.

import numpy as np

def overlap(a, b):
    # return the indices in a that overlap with b, also returns 
    # the corresponding index in b only works if both a and b are unique! 
    # This is not very efficient but it works
    bool_a = np.in1d(a,b)
    ind_a = np.arange(len(a))
    ind_a = ind_a[bool_a]

    ind_b = np.array([np.argwhere(b == a[x]) for x in ind_a]).flatten()
    return ind_a,ind_b

Usage:

import overlap

a = np.array([1,2,4,5])
b = np.array([4,6,10,9,1])

ind_a, ind_b = overlap.overlap(a,b)

UPDATE (2016-09-28): Based on comments from Mike (see below), there is different way to do this to account for non-unique elements. This also likely runs faster too. His suggestion is below:

def overlap_mbk(a, b):
    a1=np.argsort(a)
    b1=np.argsort(b)
    # use searchsorted:
    sort_left_a=a[a1].searchsorted(b[b1], side='left')
    sort_right_a=a[a1].searchsorted(b[b1], side='right')
    #
    sort_left_b=b[b1].searchsorted(a[a1], side='left')
    sort_right_b=b[b1].searchsorted(a[a1], side='right')


    # # which values are in b but not in a?
    # inds_b=(sort_right_a-sort_left_a == 0).nonzero()[0]
    # # which values are in b but not in a?
    # inds_a=(sort_right_b-sort_left_b == 0).nonzero()[0]

    # which values of b are also in a?
    inds_b=(sort_right_a-sort_left_a > 0).nonzero()[0]
    # which values of a are also in b?
    inds_a=(sort_right_b-sort_left_b > 0).nonzero()[0]

    return a1[inds_a], b1[inds_b]

Hours between sunrise and sunset at different observatories with python

By T. Do

sunset_to_sunrise

The plot above shows the number of hours between sunrise and sunset at three different locations for observatories around the world. Mauna Kea, Hawaii is home to a number of telescopes including the Keck Telescopes, Gemini, and Subaru. Kitt Peak is a mountain outside of Tuscon and includes the Mayall Telescope and WIYN telescope. Cerro Paranal is a mountain in Chile with the four VLTs. The variation in night time depends on the latitude of the location on Earth. Because Mauna Kea in Hawaii is closer to the equator, there is less variation in the number of night time hours.

To create this plot, I used the python package Astral and datetime. Below I will discuss in more detail how to use these packages.

Continue reading “Hours between sunrise and sunset at different observatories with python”

Moore’s Law for Telescopes: Light collecting area of telescopes doubles every 20 years

by T. Do

telescope_over_time

One of the most important changes in telescopes over time is their size. Larger telescopes collect more light, which lets us see fainter and farther objects. Galileo’s first telescope had a 1.5 cm diameter opening to let light through its lenses. This is slightly smaller than a dime. Using this telescope he discovered the moons of Jupiter. Today, the largest optical and near-infrared telescopes are imaging Jupiter-sized planets in other solar systems, and seeing light from some of the first galaxies. The next generation of telescopes coming in about 10 years, like TMT, E-ELT, and GMT are predicted to see the first stars in the universe.

This is possible through the tremendous pace of technological progress. The above plot uses data tabulated on Wikipedia to show the light collecting area of large telescopes at the time they were built. Between Galileo’s telescope and the Keck telescopes, their diameters have grown by over a factor of 600, while their area has grown by over 400,000 times. The line in the plot is a exponential fit to the relationship between time a telescope was complete and the collecting area (Area = Ao exp(time/Tau)). This means that the area of telescopes have doubled roughly about once every 20 years over the 400 year history of telescopes. This trend is not always smooth as there can be a large jump in the size of the mirrors between generations of telescopes. For example, the jump from Keck (1993) to TMT (2024) will represents a change from 10 m to 30 m, which corresponds to a factor of 9 increase in area in 31 years.

Near-infrared stellar absorption line list from Arcturus

T. Do

The star Arcturus is among the brightest infrared sources in the sky, which makes it possible to observe the star at very high spectral resolutions. The infrared spectrum of Arcturus serves as a standard for our understanding of red giants (Arcturus is a K0III star). A very comprehensive spectral atlas of Arcturus was published by Hinkle, Walllace, and Livingston (1995). The observed spectrum is available through VizieR. Here for convenience, I’ve converted the appendices of atomic and molecular lines into comma-separated text files and converted the line centroids into microns. Lines are at vacuum wavelengths.

To download the lists: atomic line list and molecular line list.

For convenience, I’ve also included the atomic line list in an html table at the end of the post. It is a very long table, so it will be below the fold. If you have trouble viewing, it may be your javascript settings.

Continue reading “Near-infrared stellar absorption line list from Arcturus”

Comparison of Standard Solar Compositions

T. Do

The standard solar composition was updated in Grevesse, Asplund, & Sauval 2007 (GAS07), which has significantly lower abundances for some elements compared to the previous reference Grevesse & Sauval 1998 (GS98). However, many papers still report abundance measurements such as [Fe/H] relative to GS98. For example, Fe in GAS07 is 0.05 dex lower than in GS98. In order to facilitate correction for abundance measurements between GS98 and GAS07, the following are excerpts from the tables of elemental abundances.

The abundance of an element is relative to hydrogen as defined by: A_el = log_N_el/N_H +12.0 (e.g. hydrogen is defined as A_H = 12.0). The following table shows the photospheric values from the two references.

Element A_el (GAS07) A_el (GS98)
H 12.0 12.0
He 10.93 +- 0.01 10.93 +- 0.004
Li 1.05 +- 0.10 1.10 +- 0.10
C 8.39 +- 0.05 8.52 +- 0.06
N 7.78 +- 0.06 7.92 +- 0.06
O 8.66 +- 0.05 8.83 +- 0.06
Na 6.17 +- 0.04 6.36 +- 0.02
Ca 6.31 +- 0.04 8.83 +- 0.06
Fe 7.45 +- 0.05 7.50 +- 0.05

Testing the MOSFIRE wavelength solution: comparison of arc lamp wavelength solution to sky lines

Tuan Do

In order to determine the accuracy of the MOSFIRE wavelength solution, I compared the solution determined using Ne and Ar lamps with that of the expected positions of OH lines.  This experiment tests a specific night and configuration, so is not meant to be comprehensive.

The dataset comes from observations of the Galactic center on 2012-05-29 and were reduced with an unofficial development version of the MOSFIRE pipeline that was modified to use the arcs to compute the wavelength solution.

mosfire_raw_rectified
One of the reduced, rectified slits without sky subtraction. The rectification is based on using the Ne and Ar lamp solution.

Continue reading “Testing the MOSFIRE wavelength solution: comparison of arc lamp wavelength solution to sky lines”

Changing the way natbib displays citations: punctuations

By default, when using citep with natbib, the output is something like (Author et al., 1999), where there is a comma after the author, before the date. This is not the same format like that of ApJ, where the extra comma is not included. Using aastex or emulateapj should usually take care of this, but if you need to use natbib alone, it is possible to change the format using the ”aysep={char}” option in ”setcitestyle”:

setcitestyle{aysep={}}

More info about this method here: gking.harvard.edu/files/natnotes2.pdf

More generally, you can use ”bibpunct” to change more of the punctuations. This command requires specification of six different components:

  1. the opening bracket symbol, default = (
  2. the closing bracket symbol, default = )
  3. the punctuation between multiple citations, default = ;
  4. the letter `n’ for numerical style, or `s’ for numerical superscript style, any other letter for author-year, default = author-year;
  5. the punctuation that comes between the author names and the year
  6. the punctuation that comes between years or numbers when common author lists are suppressed (default = ,);

The optional argument is the character preceding a post-note, default is a comma plus space. In redefining this character, one must include a space if one is wanted.

Example 1, bibpunct{[}{]}{,}{a}{}{;} changes the output of

citepjon90,jon91,jam92

into [Jones et al. 1990; 1991, James et al. 1992].

Example 2, bibpunct[; ]{(}{)}{,}{a}{}{;} changes the output of

citep[and references therein]{jon90}

into (Jones et al. 1990; and references therein).

See http://merkel.zoneo.net/Latex/natbib.php for more info. 

A brief statistical analysis of the Canadian penny phase-out: Are stores now charging more because of rounding?

On February 4th, 2013, the Royal Canada Mint stopped making pennies. This means that effectively soon, cash transactions will have to be rounded to the nearest 5 cents. Example: $1.01 will be $1.00, $1.03 will be $1.05, and $1.08 will be $1.10 (official Canadian guide to this: http://www.fin.gc.ca/1cent/index-eng.asp).  In principle, if the price of an item is random, then on average, all transactions should even out: sometimes the rounding will be to your advantage, sometimes it will be to the store’s. However, prices are not random, as walking through a supermarket will tell you. Very often, prices end in 99 cents. In the worse case scenario then, if you ever only make one transaction at a time, and buy items that end in 99 cents, you will be charged a penny extra each time (if you buy some food item that is not taxed). Generally, this would not be much of a problem, unless there is a systematic statistical bias overall. That would mean that while the rounding procedure is meant to be a “fair and transparent matter”, people would end up paying more inadvertently.

Continue reading “A brief statistical analysis of the Canadian penny phase-out: Are stores now charging more because of rounding?”

Using emulateapj: multi-page tables and deluxetable

emulateapj should work fine with deluxetable. For tables spanning one column use the normal deluxetable, while for tables spanning both columns use deluxetable*  

To use multi-page tables, follow the instructions given in http://hea-www.harvard.edu/~alexey/emulateapj/emulateapj.cls:


%% 3) Multi-page tables cannot be set properly inside the main text; you
%% need to move the table to the end of the paper (after the references) and
%% issue the command LongTables before it.

LongTable
begin{deluxetable*}
....
end{deluxetable*}

If you need to have landscape tables, you will also need to use the package lscape along with forcing emulateapj to use revtex4 in the preamble (using rotate inside of deluxetable will not work):


documentclass[iop,revtex4]{emulateapj}

You also must put the table at the end of file with the other tables. For example usage, see the following note from ”emulateapj.cls”:


%% 2) The {deluxetable} environment is re-implemented (the problem with the
%% the aastex's deluxetable is it does not float). There is also a new
%% environment {deluxetable*} (absent in aastex) to set a floating table
%% two-column wide. Known problems:
%% (a) rotate doesn't work (too difficult to implement). However,
%% you can use revtex's turnpage environment
%% - load package lscape (usepackage{lscape} in the header)
%% - move table at the end of the paper after references
%% - clearpage before the table
%% - LongTables if the table will span more than 1 page (see next item)
%% - put the table inside the landscape environment and clearpage
%% at the end:
%% clearpage
%% LongTables % optionally
%% begin{landscape}
%% begin{deluxetable}
%% ....
%% end{deluxetable}
%% clearpage
%% end{landscape}
%%

Python script to extract pages from a postscript file

This script will examine a postscript file and output individual pages as separate ps files. This is useful for separating pages for figures.


import subprocess
import sys
import numpy as np
import os

if (len(sys.argv) > 1):
fileName = sys.argv[1]
print fileName
callStr = "-c %%Page: "+fileName
#print 'calling: '+callStr

process = subprocess.Popen(['grep '+callStr],shell=True,stdout=subprocess.PIPE)
pages = process.communicate()
pages = int(pages[0])

print '%i pages to extract' % (pages)
for i in np.arange(1,pages+1):
tempSplit = os.path.splitext(fileName)
tempName = tempSplit[0]+'_page'+str(i)+tempSplit[1]
print 'Outputing file: '+tempName
subprocess.call(['psselect','-p'+str(i), fileName, tempName])