Examples
General examples
Example 1: Reference-to-signal with positions of interest (no signal in input table)
In this example, the alignment file does not contain raw signal data, so the corresponding POD5 input must be provided. We extract the mean, standard deviation, and dwell time of the signal around given reference positions and output a melted TSV.
fishnet reformat \
--alignment alignments_ref.parquet \
--pod5 /data/pod5_runs/run1 /data/pod5_runs/run2 \
--positions-of-interest chr1:100000-10 chr2:250000-15 \
--strategy stats \
--stats mean std dwell \
--out ref_positions_stats.tsv \
--output-shape melted \
--threads 8 \
--force-overwrite
Explanation:
- --alignment provides reference-to-signal mappings.
- --pod5 supplies raw signal data (since it’s missing in the alignment file).
- --positions-of-interest defines windows around base positions (±10 and ±15 bases).
- The stats strategy calculates per-base signal statistics.
- Output is written as a melted TSV table, one row per base.
Example 2: Query-to-signal with motif filtering and interpolation
Here, the alignment file already contains raw signal and includes both reference and query alignments. We select the query alignment, filter by motifs from a FASTA file, and interpolate the signal to a uniform length of 50. The result is stored as a nested Parquet file.
fishnet reformat \
--alignment alignments_query_signal.parquet \
--alignment-type query \
--motifs-file motifs.fasta \
--strategy interpolate \
--target-size 50 \
--out interpolated_query_signal.parquet \
--output-shape nested \
--threads 8 \
--force-overwrite
Explanation:
- --alignment-type query selects the query-to-signal mappings.
- --motifs-file loads motifs (e.g., ATGCGT, TTTAAA, etc.) from a FASTA file.
- --strategy interpolate 50 creates uniformly sized signal vectors (50 samples per base).
- nested output preserves per-base signal arrays in Parquet — ideal for machine learning input.
Detailled (minimal) processing example
The following examples shows what gets calculated and how it gets written to file with different output settings. We'll use the following example:
- reference to signal alignment of two reads:
1. readA maps to chr1:3-8
2. readB maps to chr1:4-14
- reference regions of interest:
- chr1:5-7
- chr1:12-13
- For base-wise stats, mean and dwell are used
- For interpolation, a target size of 3 is used
0-based index: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4
1-based index: 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
Ref sequence: A C G T|A G|C T A A A|G T|C T
| | | |
readA: G T|A G|C T | |
readB: T|A G|C T A A A|G T|C
| | | |
| | | |
regions of interest: chr1:5-6 chr1:12-13
Base-wise stats
fishnet reformat \
--alignment [...] \
--ref-regions "chr1:5-6" "chr1:12-13" \
--strategy "stats" \
--stats "mean" "dwell" \
--out [...] \
--output-shape [...]
mean and dwell are chosen for stats. Accordingly, both statistics are calculated for readA at the 5th and 6th reference base, and for readB at the 5th, 6th, 12th and 13th base.
The melted output would look like this:
| read_id | start_index_on_read | region_of_interest | base_index | base | mean | dwell |
|---|---|---|---|---|---|---|
| readA | 2 | chr1:5-7 | 0 | A | mA5 | dA5 |
| readA | 2 | chr1:5-7 | 1 | G | mA6 | dA6 |
| readB | 1 | chr1:5-7 | 0 | A | mA5 | dA5 |
| readB | 1 | chr1:5-7 | 1 | G | mA6 | dA6 |
| readB | 8 | chr1:12-13 | 0 | G | mA12 | dA12 |
| readB | 8 | chr1:12-13 | 1 | T | mA13 | dA13 |
The exploded format would look like this:
| read_id | start_index_on_read | region_of_interest | base_0 | base_1 | mean_0 | mean_1 | dwell_0 | dwell_1 |
|---|---|---|---|---|---|---|---|---|
| readA | 2 | chr1:5-7 | A | G | mA5 | mA6 | dA5 | dA6 |
| readB | 1 | chr1:5-7 | A | G | mB5 | mB6 | dB5 | dB6 |
| readB | 8 | chr1:12-13 | G | T | mB12 | mB13 | dB12 | dB13 |
The nested format would look like this:
| read_id | start_index_on_read | region_of_interest | bases | mean | dwell |
|---|---|---|---|---|---|
| readA | 2 | chr1:5-7 | AG | [mA5, mA6] | [dA5, dA6] |
| readB | 1 | chr1:5-7 | AG | [mB5, mB6] | [dB5, dB6] |
| readB | 8 | chr1:12-13 | GT | [mB12, mB13] | [dB12, dB13] |
Interpolation
fishnet reformat \
--alignment [...] \
--ref-regions "chr1:5-6" "chr1:12-13" \
--strategy "interpolate" \
--target-size 3 \
--out [...] \
--output-shape [...]
3. This results in the interpolated signal for readA at the 5th and 6th base, and for readB at the 5th, 6th, 12th and 13th reference base.
Here is a diagram to show what the data would look like:
Raw per-base signal chunks (variable lengths):
readA
base 5 → [ . . . . . ] (5 measurements)
base 6 → [ . . . . . . . . . . . ] (11 measurements)
readB
base 5 → [ . . . . ] (4 measurements)
base 6 → [ . . . . . . . . . . . . . ] (13 measurements)
base 12 → [ . . . . . . . . . . ] (10 measurements)
base 13 → [ . . . . . . . ] (7 measurements)
After interpolation to target size = 3:
readA
base 5 → [ sA5_0 sA5_1 sA5_2 ] (3 measurements)
base 6 → [ sA6_0 sA6_1 sA6_2 ] (3 measurements)
readB
base 5 → [ sB5_0 sB5_1 sB5_2 ] (3 measurements)
base 6 → [ sB6_0 sB6_1 sB6_2 ] (3 measurements)
base 12 → [ sB12_0 sB12_1 sB12_2 ] (3 measurements)
base 13 → [ sB13_0 sB13_1 sB13_2 ] (3 measurements)
The melted output would look like this:
| read_id | start_index_on_read | region_of_interest | base_index | base | signal_0 | signal_1 | signal_2 | dwell |
|---|---|---|---|---|---|---|---|---|
| readA | 2 | chr1:5-7 | 0 | A | sA5_0 | sA5_1 | sA5_2 | dA5 |
| readA | 2 | chr1:5-7 | 1 | G | sA6_0 | sA6_1 | sA6_2 | dA6 |
| readB | 1 | chr1:5-7 | 0 | A | sB5_0 | sB5_1 | sB5_2 | dB5 |
| readB | 1 | chr1:5-7 | 1 | G | sB6_0 | sB6_1 | sB6_2 | dB6 |
| readB | 8 | chr1:12-13 | 0 | G | sB12_0 | sB12_1 | sB12_2 | dB12 |
| readB | 8 | chr1:12-13 | 1 | T | sB13_0 | sB13_1 | sB13_2 | dB13 |
The exploded format would look like this:
| read_id | start_index_on_read | region_of_interest | base_0 | base_1 | signal_base0_0 | signal_base0_1 | signal_base0_2 | signal_base1_0 | signal_base1_1 | signal_base1_2 | dwell_0 | dwell_1 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| readA | 2 | chr1:5-7 | A | G | sA5_0 | sA5_1 | sA5_2 | sA6_0 | sA6_1 | sA6_2 | dA5 | dA6 |
| readB | 1 | chr1:5-7 | A | G | sB5_0 | sB5_1 | sB5_2 | sB6_0 | sB6_1 | sB6_2 | dB5 | dB6 |
| readB | 8 | chr1:12-13 | G | T | sB12_0 | sB12_1 | sB12_2 | sB13_0 | sB13_1 | sB13_2 | dB12 | dB13 |
The nested format would look like this:
| read_id | start_index_on_read | region_of_interest | bases | signal | dwell |
|---|---|---|---|---|---|
| readA | 2 | chr1:5-7 | AG | [[sA5_0, sA5_1, sA5_2], [sA6_0, sA6_1, sA6_2]] | [dA5, dA6] |
| readB | 1 | chr1:5-7 | AG | [[sB5_0, sB5_1, sB5_2], [sB6_0, sB6_1, sB6_2]] | [dB5, dB6] |
| readB | 8 | chr1:12-13 | GT | [[sB12_0, sB12_1, sB12_2], [sB13_0, sB13_1, sB13_2]] | [dB12, dB13] |