Pod5Read
The Pod5Read struct grants access to the signal and metadata corresponding to a single read, all of which can be accessed by various getter functions. A read can not be manually initialized, but needs to be retrieved from a pod5 file or dataset.
The following table shows all fields that are accessible from a Pod5Read. The descriptions are taken mostly from the official pod5 docs.
| Field | Description |
|---|---|
| read_id | Globally-unique identifier for the read, can be converted to a string form (using standard routines in other libraries) which matches how reads are identified elsewhere |
| signal | The actual signal |
| signal_indices | A list of zero-indexed row numbers in the Signal table. This must be all the rows in the Signal table that have a matching read_id, in order. It functions as an index for the Signal table |
| read_number | The read number on channel. This is increasing but typically not necessarily consecutive |
| start | How many samples were taken on this channel before the read started (since the data acquisition period began). This can be combined with the sample rate to get a time in seconds for the start of the read relative to the start of data acquisition |
| median_before | "The level of current in the well before this read (typically the open pore level of the well). If the level is not known (eg: due to a mux change), this should be nulled out |
| num_minknow_events | Number of minknow events that the read contains |
| tracked_scaling | Collects tracked_scaling_shift (Shift for tracked read scaling values (based on previous reads shift)) and tracked_scaling_scale (Scale for tracked read scaling values (based on previous reads shift)) |
| predicted_scaling | Collects predicted_scaling_shift (Shift for predicted read scaling values (based on this read's raw signal)) and predicted_scaling_scale (Scale for predicted read scaling values (based on this read's raw signal)) |
| num_reads_since_mux_change | Number of selected reads since the last mux change on this reads channel |
| time_since_mux_change | Time in seconds since the last mux change on this reads channel |
| num_samples | The full length of the signal for this read in samples (equal to the sum of all 'samples' fields of signal chunks) |
| pore | Collects the pore_type (Name of the pore type present in the well), channel (1-indexed channel) and well (1-indexed well (typically 1, 2, 3 or 4)) information |
| calibration | Collects calibration_offset (Calibration offset used to scale raw ADC data into pA readings) and calibration_scale (Calibration scale factor used to scale raw ADC data into pA readings) |
| end_reason | Collects end_reason (The end reason, currently one of: unknown, mux_change, unblock_mux_change, data_service_unblock_mux_change, signal_positive, signal_negative, api_request, device_data_error, analysis_config_change or paused) and end_reason_forced (True if this read was ended 'forcibly' (eg: mux_change, unblock), false if it was a data-driven read break (signal_positive, signal_negative). This allows simple categorisation even in the presence of new reasons that reading code is unaware of) info |
| run_info_id | Id that matches the acquisition_id in the run info stored for the overarching Pod5File |
The user has the choice to call the standard getter function (e.g. Pod5Read::read_id()), which wraps the value in an option, or the require function (e.g. Pod5Read::require_read_id()) which wraps the value in a Result. All information should be present for a standard read, but the internal arrow schema lists the fields for the data as optional, so they can technically be missing. That's why the values are not directly available.