Plotting#

A selection of plotting routines for Granule Explorer output data.

Outline#

We provide a number of routines for visualing the data stored in “aggregate_hittings.h5”. These include various 1 and 2D histograms, quartile plots and error estimates.

flickerprint.analysis.plotting.histogram2D(plot_column, plot_title, plot_row, row_title, granule_data, group_by='experiment', plot_group='As', column_nbins=20, row_nbins=20, legend=True, log_scaleX=True, log_scaleY=True, save_png=True, out_dir='/tmp/', plot_data: bool = False) Tuple[figure, DataFrame]#

A 2D histogram plot to visuale correlations between parameters. If it looks too sparse, (not enough points per bin) use scatter_plot instead.

Parameters:
  • plot_column (str) – The name of the column in [granule_data] to be binned along the y-axis

  • plot_title (str) – The label for the y-axis

  • plot_row (str) – The name of the column in [granule_data] to be binned along the x-axis

  • plot_row – The label for the y-axis

  • granule_data (Pandas dataframe) – The granule data for the plot, see the section in the Docs on “aggregate_fitting.h5” for the format required of the dataframe

  • group_by (str) – The name of the column in [granule_data] which will be used to group the data before plotting. Only granules with a value of [plot_group] in this column will be plotted

  • plot_group (anything) – The value in [group_by] of granules that should be plotted

  • column_nbins (int) – The number of bins along the column axis

  • row_nbins (int) – The number of bins along the row axis

  • legend (bool) – Add a legend to the plot or not

  • x_log_scale (bool) – Set x axis to a log scale if true

  • y_log_scale (bool) – Set y axis to a log scale if true

  • save_png (bool) – Saves the figure to a png in [out_dir] if true

  • out_dir (str) – The path that the output figure should be saved to

  • Outputs

  • -------

  • true (Figure to [out_dir] if [save_file] is)

Return type:

A matplotlib figure

flickerprint.analysis.plotting.overlap_hist(plot_column, plot_label, granule_data: ~pandas.core.frame.DataFrame, plot_errors=None, group_by='experiment', n_bins=20, agg=<function gmean>, density=False, legend=False, log_scale=True, quiet: bool = True, save_png=True, out_dir='/tmp/', plot_data: bool = False) Tuple[figure, DataFrame]#

Draw overlapping histograms of [plot_column], split by [group_by].

Plots a histogram of a variable with the 67% of points cloest to the medium shown in a darker colour, and the average (as determined by agg) shown with a verticle line. Also prints a summary of the mean and error.

Parameters:
  • plot_column (str) – The name of the column in [granule_data] to be plotted as a histogram

  • plot_label (str) – The label for the x-axis

  • granule_data (Pandas dataframe) – The granule data for the plot, see the section in the Docs on “aggregate_fitting.h5” for the format required of the dataframe

  • plot_errors (str or None) – If None, errorbars are not plotted The column in [granule_data] containing the error estimates for the values in [plot_column], used to estimate the error bars on the histogram bars.

  • group_by (str) – The name of the column in [granule_data] which will be used to group the data before plotting. The graphs for each group will be plotted one on top of the other.

  • n_bins (int or array) – If int, then the number of bins. If array, then the bin edges.

  • agg (function Pandas dataseries -> float) – The function used to calculate the colour values. Usually some type of mean.

  • out_dir (str) – The path that the output figure should be saved to

  • density (bool) – If true, plot a probability density so the area under the graph is 1.

  • legend (bool) – Add a legend to the plot or not

  • log_scale (bool) – Set x axis to a log scale if true

  • benchling_format (bool) – If true, print summary to the screen, optimized for cutting and pasting into tables.

  • save_png (bool) – Saves the figure to a png in [out_dir] if true

  • out_dir – The path that the output figure should be saved to

  • Outputs

  • -------

  • true (Figure to [out_dir] if [save_file] is)

Return type:

A matplotlib figure

flickerprint.analysis.plotting.pair_plot(granule_data: DataFrame, save_png=True, out_dir: Path = '/tmp/') Tuple[figure, DataFrame]#

Uses seaborn’s pairplot to draw 1 and 2D histograms of the Surface Tension, Bending Rigidity and Mean Radius for granules in [granule_data].

Parameters:
  • granule_data (Pandas dataframe) – The granule data for the plot, see the section in the Docs on “aggregate_fitting.h5” for the format required of the dataframe

  • save_png (bool) – Saves the figure to a png in [out_dir] if true

  • out_dir (str) – The path that the output figure should be saved to

  • Outputs

  • -------

  • true (Figure to [out_dir] if [save_file] is)

Returns:

  • A matplotlib figure

  • Outputs

  • ——-

  • Figure to [out_dir]

flickerprint.analysis.plotting.read_data(input_file, comp_file=None, data_file_name='aggregate_fittings.h5')#

Reads in one or more aggregate_fitting.h5 files and concatenates each one

Parameters:
  • input_file (str) – The path to either a [data_file_name] file or a folder containing data files. If a file, it will open that file as a data frame and return it. If a folder, it will recursivly search subfolders for files named [data_file_name], open all the files and concatenate the result into a single data frame.

  • comp_file (str) – This is for backwards compatability only! If you have data from before May 2022, it may come with a separate “comparision” file containing additional information. This parameter should be a path to that file, otherwise None.

  • data_file_name (str) – the name of the .h5 file to open. Default: aggregate_fittings.h5

Return type:

a pandas data frame containing all the data from the .h5 files opened