Working with Output#
Units#
The units of the output catalogs are generally decimal degrees for ra and dec,
radians North of East (I know…) for pa, arcseconds for rhalf, and nJy
for fluxes. Note that for q the default units are actually sqrt(b/a).
Basic output products#
For a given run with a given starting configuration and input catalog, multiple patches (regions on the sky and sub-catalogs) will have been fit using MCMC. There are two principal kinds of output, samples and residuals. These are specific to each patch. Therefore, the information is organized in a directory structure that looks something like:
root_path/- the base directory for a given run<config_file>.yml- The input configuration file used for the run.config.json- The full configuration dictionary used for the run as a json file; This includes any command line overrides of the config file parameters<outname>.fits- The full catalog with ‘current’ parameter values. Also includes information like number of MCMC samples per object. For optimization the ‘current’ values are the optmized values, while for sampling they are the values at the last iteration of the MCMC chain.<outname>_log.json- A record of which sources (as indices) appeared in each patch, and the order in which the patches are run.patches/patch<i>_samples.h5- information about the i th patch and resulting MCMC chainpatch<i>_residuals.h5- residuals (data - model) and other pixel data for the last iteration in the chain for the i th patch
The detailed structure of these output files is described below. Numerous
utilities for working with the basic output are supplied in
the forcepho.postprocess module.
High level products#
High level output products are the result of post-processing the basic output patch-level products (see above). These high-level products include final parameter catalogs. For sampling a combined chain catalog is also available, from which uncertainties can be estimated. DS9 region files showing the final sources and the patches can be generated. Finally, residual and model images can be produced.
Final parameter catalog#
After sampling, a catalog given MCMC chains of the posterior samples for every source parameter can be constructed from the patch-level chains. This is accomplished with
from forcepho.postprocess import postsample_catalog
postsample_catalog("<path/to/output/run/directory>", "<path/to/chain_catalog.fits>")
The output catalog will contain chains for each parameter of each source from which parameter uncertainties can be estimated.
Residual images#
If residuals were saved for each patch by the child processes, they can be assembled into combined residual (plus data and optionally model) images pixelwise matched to the input images.
from forcepho.postprocess import write_images
write_images("<path/to/output/run/directory>", metafile="<path/to/metastore.json>", show_model=True)
where path/to/metastore.json is the path to the meta-data storage for the
images used in the fitting. This will place residual images corresponding to the
input images in the metastore into the path/to/output/run/directory/images/
directory.
Region files#
DS9 region files can be useful for exploring any issues in the fitting. They can be produced as follows:
from forcepho.postprocess import write_patchreg, write_sourcereg
_ = write_patchreg("<path/to/output/run/directory>")
_ = write_sourcereg("<path/to/output/run/directory>", showid=True)
These region files will be placed in "path/to/output/run/directory/images/"
directory by default, but the output location can be specified as a keyword
argument.
Run log#
We can read in the overall run information as:
import json
from astropy.io import fits
root = "root_path"
outname = "outscene"
with open(f"{root}/{outname}_log.json") as f:
logs = json.load(f)
slog = logs["sourcelog"]
plog = logs["patchlog"]
outcat = fits.getdata(f"{root}/{outname}.fits")
with open(f"{root}/config.json") as f:
config = json.load(f)
Samples#
The patch<i>_samples.h5 output file contains info from the MCMC chain, as well
as a great deal of info about the patch itself and the way the fitting happened.
We can read it in to convenience class forcepho.postprocessing.Samples with
helpful methods as:
from forcepho.postprocessing import Samples
i = plog[0]
samples = Samples(f"{root}/patches/patch{i}_samples.h5")
Now we have an object which has some important attributes. These include:
bands- A list of strings giving the bands that were fit simultaneously. These are also the column names for fluxes in various catalogs.shape_cols- A list of strings giving the column names for fitted position and shape parameterschaincat- numpy structured array of parameter values in the MCMC chain. There is one row per active source, and each field will generally have shape(niter,)except for the informational ields (id, etc.)chaina plain numpy array of shape(niter, npar)that gives the MCMC chain. Generally not very useful, since it’s can be hard to figure out which column is which parameter, and the positions are all relative to the patch center. But since it’s not structured it can be useful for performing various math operations.active- a structured ndarray that is a catalog of the active (i.e. fitted) sources in the patch, including the starting parameter positionsfixed- a structured ndarray that is a catalog of the fixed i.e. (subtracted but not fitted) sources in the patch.final- likeactive, but containing the final parameter positions at the end of sampling (i.e. the state at the last iteration)bounds- a structured array giving the constraints (lower an upper bounds) used for each of the fitted parameters. It has fields likeactive, but with 2 elements (lo, hi) for each fitted parameter instead of 1cov- ndarray of shape(npar, npar)that gives the inverse mass matrix used in sampling. However, this matrix is defined in the unconstrained parameter space, which is related to the constrained parameter space by a non-linear transform, and thus is not super useful except as a diagnostic.
There are other attributes (config, ncall, wall_time, stats, ra, dec,
radius, reference_coordinates, etc.) that may be useful. Then if we
wanted to, say plot the trace for the F606W flux of the third galaxy in the
patch, we would do:
import matplotlib.pyplot as pl
fig, ax = pl.subplots()
ax.plot(samples.chaincat["F606W"][:, 2])
Or, to plot the trace for all the parameters of the third active source:
bands = samples.bands
fig, axes = pl.subplots(6 + len(bands))
source_idx = 2
samples.show_chain(source_idx, bands=bands, axes=axes)
Residuals#
The residuals for each patch are stored in a basic HDF5 file. The structure is
BAND/expososureID/xpix
BAND/expososureID/ypix
BAND/expososureID/data
BAND/expososureID/ierr
BAND/expososureID/residual
where each of these is a simple 1-d array. The residual is in the sense data minus model, where the model is usually the last iteration of the chain for that patch. But there are convenience methods for displaying these:
from forcepho.reconstruction import Residuals
filename = f"{root}/patches/patch{i}_residuals.h5"
res = Residuals(filename)
# all the exposures
print(res.exposures)
# show the second exposure (e=1) with scale +/- 1 nJy
# this shows data, residual, and model from left to right
# you can also specify the exposure by name, using `exp='blah'` instead of `e=1`
fig, raxes = pl.subplots(1, 3)
data, model, delta = res.show(e=1, axes=raxes, vmin=-1, vmax=1)
# mark fixed sources on the residual:
res.mark_sources(samples.fixed["ra"], samples.fixed["dec"], e=1, axes=[raxes[1]])
# find pixel locations of a particular ra, dec in a particular exposure:
pix = res.sky_to_pix(samples.active["ra"], samples.active["dec"], e=1)
There’s also a good deal of WCS info, but in highly non-standard format.
However, you can also use the Residuals object to place the data or residuals
into a larger image (though without WCS info at the moment.) Note that for
mosaics the full images may have been broken down into 2048 x 2048 tiles, so
that’s what gets generated by this method.
subdir = "images"
os.makedirs(f"{root}/{subdir}", exist_ok=True)
deltas = {} # images will be stored here keyed by exposureID
# we loop over patches in order so each pixel will be represented by the most
# recent patch/model that touched it
for i in plog:
r = Residuals(f"{root}/patches/patch{i}_residuals.h5")
# The imshape tells the size of the original exposure
# use fill_type = "data" or "ierr" to get the original pixel fluxes or inverse error.
r.fill_images(deltas, fill_type="residual", imshape=(2048, 2048))
# Note this does *not* include WCS info; that capability to be added soon.
for expID, im in deltas.items():
band, exp = expID.decode("utf-8").split("/")
fits.writeto(f"{root}/{subdir}/{exp}_delta.fits", im)
There is a handy method in forcepho.postprocess.write_images that will do this for all
patches and all the relevant exposures in the data store.