Image processing tools for petabyte-scale light sheet microscopy data

Generic computing frameworkOur generic computing framework supports both single machines and large-scale Slurm-based computing clusters with CPU and/or GPU node configurations. The conductor job orchestrates the processing after it receives a collection of function strings (a MATLAB function call or a Bash command executed by each worker), input file names, output file names and relevant parameters for job settings, such as required memory, the number of CPU cores and the system environment. The conductor job initially checks for the presence of output files, skipping those that already exist. In single-machine setups or when Slurm job submission is disabled, the conductor job will sequentially execute tasks. Conversely, in cluster environments with the Slurm job scheduler, the conductor job formats and submits Slurm commands based on the function strings and job parameters, delegating tasks to workers in the cluster. It continuously monitors these jobs, ensuring the completion of the tasks. If a worker job fails, the conductor job resubmits it with an increased memory and CPU resources, often doubling the original specifications, until all tasks are completed, or a preset maximum retry limit is reached. Additionally, the framework allows users to define a custom configuration file. This feature tailors Slurm-related parameters to specific needs, ensuring adaptability to various user-defined function strings and compatibility with different Slurm-based computing clusters.Fast Tiff and Zarr readers and writersOur Tiff reader/writer leverages the capabilities of the libtiff library in C++ with the MATLAB MEX interface. When reading, a binary search is used to determine the number of z-slices by identifying the last valid slice, as there is no direct way to query the number of z-slices in libtiff. The OpenMP framework is then used to distribute the reading tasks across multiple threads, partitioning the z-slices into evenly sized batches (except for the last one). For large 2D images, the Tiff strips are partitioned to facilitate multi-threaded reading using the OpenMP framework. For the Tiff writer, LZW compression from libtiff is adapted to support compression on individual z-slices. This approach enables parallel compression across z-slices, leveraging the OpenMP framework for multi-threading. The final compressed data are written to disk using a single thread because a Tiff file is a single container, making parallel writing of compressed data infeasible.As MATLAB lacks a native Zarr reader and writer, we developed custom C++ code that complies with the Zarr specification (version 2) with enhanced parallelization. This code is also integrated with MATLAB through the MEX interface. In our implementation, the OpenMP framework is used for both reading and writing to distribute the tasks across multiple threads, treating each chunk as a separate task. We use the compression algorithms from the Blosc library57, which introduces an additional layer of multi-threading, thus optimizing the use of system resources. Zstd compression with a level 1 setting is used to achieve an optimal balance of compression ratio and read/write time. The high compression ratio of zstd substantially reduces the overall data size, reducing network load, particularly in extensive high-throughput processing scenarios where the network is often the primary bottleneck. By default, we read and write Zarr files in the ‘Fortran’ (column-major layout) order because MATLAB is based on ‘Fortran’ order, and converting between ‘C’ (row-major layout) and ‘Fortran’ orders adds additional overhead.Combined deskew, rotation, resampling and croppingWe execute deskew, rotate and resampling (if needed) in a single step by combining these geometric transformations. The fundamental geometric transform involves:where I is the original image, It represents the transformed image, FT( ⋅ ) denotes the image warp function corresponding to the geometric transformation matrix T. The deskew operator applies a shear transformation defined by the shear transformation matrix Sds. In the rotation process, there are four sub-steps: translating the origin to the image center, resampling in the z axis to achieve isotropic voxels, rotating along the y axis, and translating the origin back to the starting index. Let the transformation matrices be denoted as T1, S, R and T2, respectively. If resampling factors are provided (by default as 1), then there are three additional sub-steps in resampling: translating the origin to the image center, resampling based on the factors provided, and translating the origin back to the start index. Let the transformation matrices in these sub-steps be TR1, SR and TR2, respectively.Traditionally, these three steps are executed independently, resulting in multiple geometric transformations. However, this incurs substantial overhead in run time and memory usage, particularly during the deskew step. Instead, we combine deskew, rotation and resampling into one single step, resulting in a unified affine transformation matrix:$$A={S}_{ds}({T}_{1}SR{T}_{2})({T}_{R1}{S}_{R}{T}_{R2})$$This affine transformation matrix can be directly applied to the raw image if the scan step size is sufficiently small. A quantity, denoted as ‘skew factor’, is defined to describe the relative step size as$${f}_{sk}={d}_{z}\cos \theta /{p}_{x}$$where θ ∈ (−π/2, π/2] is the skewed angle, dz denotes the scan step size, and px is the pixel size in the xy plane. If fsk ≤ 2, the direct combined processing operates smoothly without noticeable artifacts. For fsk > 2, interpolation of the raw data within the skewed space is performed before deskew and rotation, taking account of the proper relative positions of slices. Neighboring slices above and below are utilized to interpolate a z-slice. Let ws and wt = 1 − ws represent the normalized distances (ranging from 0 to 1) along the z axis between the neighboring slices and the target z-slice. In the interpolation, we first create two planes aligned with the correct voxel positions of the target z-slice by displacing the neighboring slices with a specific distance in the x direction (${w}_{s}{d}_{z}\cos \theta$ and ${w}_{t}{d}_{z}\cos \theta$, respectively). Following this, the target z-slice is obtained by linearly interpolating these two planes along the z axis with weights 1 − ws and 1 − wt. Because the image warp function permits the specification of the output view, we have also incorporated a cropping feature by providing a bounding box that allows us to skip empty regions or capture specific regions.For combined deskew and rotation without resampling, the transformation simplifies to a 2D operation in the xz plane. We optimized this scenario using SIMD (single instruction, multiple data) programming in C++. Our implementation directly supports uint16 input and output for both skewed space interpolation and deskew/rotation functions, while utilizing single-precision floating-point for intermediate steps. This improves throughput and reduces memory usage.In acquisition modes where the deskew operation is unnecessary (for example, objective scan mode of LLSM), the above processing can still be applied, provided Sds is replaced with the identity matrix.DeconvolutionRL deconvolution has the form ofwhere I is the raw data, f is the forward projector (that is, the PSF), b is the backward projector and bT is the transpose of b, ⊛ denotes the convolution operator and x(k) is the deconvolution result in k-th iteration. In traditional RL deconvolution, b = f. In the OMW method we use, the backward projector is generated with these steps:

The OTF H of the PSF f is computed, $H={\mathscr{F}}(f)$, where ${\mathscr{F}}(\cdot )$ represents the Fourier transform.

The OTF mask for the OTF support is segmented by applying a threshold to the amplitude $\left\vert H\right\vert$. The threshold value is determined by a specified percentile (90% by default) of the accumulated sum of sorted values in $\left\vert H\right\vert$ from high to low.

The OTF mask undergoes a smoothing process, retaining only the central object, followed by convex hull filling. For deskewed space deconvolution, the three major components are kept after object smoothing and concatenated into a unified object along the z axis, followed by convex hull filling.

The distance matrix D is computed with the image center as 0, and the edge of the support as 1 with the ray distance from the center to the border of the whole image.

The distance matrix D is used to calculate the weight matrix W with the Hann window function for apodization, as expressed by the following formula:$$w(x)=\left\{\begin{array}{ll}1\quad &x\le l\\ co{s}^{2}\left(\frac{\pi (x-l)}{2(u-l)}\right)\quad &l < x\le u\\ 0\quad &x > u\end{array}\right.$$Where l and u are the lower and upper bounds for the relative distances. By default, l = 0.8 and u = 1 (edge of the support). For skewed space deconvolution, the weight matrix is given as a single distance matrix by adding the distance matrix from the corresponding three components together.

Calculate Wiener filter $F=\frac{{H}^{* }}{{\left\vert H\right\vert }^{2}+\alpha }$, where α is the Wiener parameter, and H* denotes the conjugate transpose of H.

The backward projection in the Fourier space is expressed as B = W ⊙ F, where ⊙ denotes the Hadamard product operator (element-wise multiplication), and the backward projector in the real space is $b={{\mathscr{F}}}^{-1}(B)$, where ${{\mathscr{F}}}^{-1}(\cdot )$ represents the inverse Fourier transform.

The FSC method42 is used to determine the optimal number of traditional RL iterations and the optimal Wiener parameter in the OMW method. Here, the central portion of the volume, which is consistent in size across all three axes and covers sufficient content (for example, 202 × 202 × 202 for a volume with size 230 × 210 × 202), is used to compute the relative resolution. By default, the FSC is calculated with a radius of ten pixels and an angle interval of π/12. Cutoff frequencies for relative resolution are determined using one-bit thresholding58 by default, or can be user-defined. The relative resolution across iterations (or different Wiener parameters) is plotted. In ref. 37, it was determined that a slightly higher threshold produced better results (Supplementary Fig. 1a). In practice, the optimal number of RL iterations or the Wiener parameter is defined by the value closest to 1.01 times the minimum of the curve beyond the point where the curve reaches its minimum value.StitchingThe stitching process requires a CSV meta-file documenting file names and corresponding coordinates. The pipeline consists of three steps: Tiff to Zarr conversion (or preprocessing), cross-correlation registration, and parallel block stitching (fusion). The overall stitching workflow is governed by a conductor job in the generic computing framework. For Tiff to Zarr conversion and/or processing on individual tiles, the conductor job distributes tasks to individual worker jobs, assigning one worker for each tile. Each worker: (a) reads its data using the Cpp-Tiff or Cpp-Zarr (if existing Zarr data need rechunking or preprocessing) reader depending on the format; (b) performs optional processing such as flipping, cropping, flat-field correction, edge erosion or other user-defined operations; and (c) writes the processed data using the Cpp-Zarr writer.Following file conversion, stitching can be executed directly using the input tile coordinates, or normalized cross-correlation registration43 can be used first to refine and optimize the coordinates before stitching. In the registration, the conductor job utilizes coordinate information and tile indices to establish tile grids and identify neighboring tiles with overlaps. Cross-correlation registration is performed for overlapping tiles that are direct neighbors, defined as those whose tile indices differ by 1, and only in one axis. To optimize computing time and memory usage, only the overlapping regions for the tiles are loaded, including a buffer size determined by the maximum allowed shifts along the xyz axes within one tile. We can also downsample the overlapping data to achieve faster cross-correlation computing. The optimal shift between the two tiles is identified as the one exhibiting the maximum correlation within the allowable shift limits. We include a feature to exclude shifts for pairs with the maximum correlation values below a user-defined threshold. After completing the cross-correlation computation for all pairs of direct neighbor tiles, we determine the shifts for all tiles using either a local or a global method. The local approach is based on the concept of the minimum spanning tree, where the pairs of overlapping tiles are pruned to form a tree based on the correlation values from high to low, followed by registration with the pairwise optimal shifts. In the global approach, the optimal final shifts are calculated from the pairwise relative shifts through a nonlinear constrained optimization process:$$\mathop{\min }\limits_{x}\sum _{i,\,j}{w}_{ij}\parallel {x}_{i}-{x}_{j}-{d}_{ij}{\parallel }_{2}^{2}$$$$s.t.\quad l < {x}_{i}-{x}_{j} < u$$where xshift = {x1, …, xn} are the final shifts for the tiles, dij is the pairwise relative shift between tile i and j and wij is the weight between tile i and j based on the squares of maximum cross-correlation values. l and u are the lower and upper bounds for the maximum allowable shift, respectively. The goal is to position all tiles at optimal coordinates by minimizing the weighted sum of the squared differences between their distances and the pairwise relative shifts while adhering to the specified maximum allowable shifts.For images collected by subregions (batches) that have different tile grids, we use the global method for tiles within each subregion. Subsequently, the subregions are treated as super nodes, and a nonlinear constrained optimization is applied to those nodes, by minimizing the sum of squared differences of the centroid distances to the averaged shift distances.$$\mathop{\min }\limits_{x}\sum _{i,\,j}\parallel {x}_{ri}-{x}_{rj}-{d}_{r,ij}{\parallel }_{2}^{2}$$$$s.t.\quad {l}_{r} < {x}_{ri}-{x}_{rj} < {u}_{r}$$where xri and xrj are the centroid coordinates for subregions i and j, and lr and ur are lower and upper bounds for the maximum allowable shifts across subregions, respectively. The averaged shift distance, denoted as dr,ij, is determined by a weighted average of the absolute shifts across subregions, which is expressed as:$${d}_{r,ij}=\frac{{\sum }_{m\in {S}_{i},n\in {S}_{j}}{w}_{mn}{d}_{mn}}{{\sum }_{m\in {S}_{i},n\in {S}_{j}}{w}_{mn}}$$where wmn is the cross-correlation value at the optimal shift between tiles m ∈ Si, and n ∈ Sj, and Sk denotes the set of tiles in subregion k. Once the optimal shifts for the subregions are obtained, the last step is to reconstruct the optimal shifts for the tiles within each subregion by applying the optimal shifts of the centroid of the subregion to the coordinates of the tiles in it. The final optimal shifts are then applied to the tile coordinates to determine their final positions.After registration, the conductor job determines the final stitched image size and the specific locations to place the tiles. To facilitate parallel stitching, the process is executed region by region in a nonoverlapping manner. These regions are saved directly as one or more distinct chunk files in Zarr format. For each region, information about the tiles therein and their corresponding bounding boxes are stored. The conductor job submits stitching tasks to worker jobs. If the region comes from one tile, the data for the region are saved directly. If the region spans multiple tiles, these must be merged into a single cohesive region. For the overlap regions, several blending options are available: ‘none’, ‘mean’, ‘median’, ‘max’ and ‘feather’. For the ‘none’ option, half of the overlap region is taken from each tile. For the ‘mean’, ‘median’ and ‘max’ options, the voxel values in the stitched region are calculated as the mean, median and maximum values from the corresponding voxels in the overlapping regions, respectively. Feather blending involves calculating the weighted average across the tiles44. The weights are the power of the distance transform of the tiles as follows:$${w}_{i,m}={d}_{i,m}^{\alpha }/\left({d}_{i,m}^{\alpha }+{d}_{j,n}^{\alpha }\right)\quad \,\text{and}\,\quad {w}_{j,n}={d}_{j,n}^{\alpha }/\left({d}_{i,m}^{\alpha }+{d}_{j,n}^{\alpha }\right)$$$${I}_{s,l}={w}_{i,m}{I}_{i,m}+{w}_{j,n}{I}_{j,n}$$where di,m and dj,n are distance transforms for voxel m in tile i and voxel n in tile j, α is the order (10 by default), Ii and Ij are the intensities for tiles i and j, and Is is the intensity for the stitched image s. Here we assume voxel m in tile i and voxel n in tile j are fused to voxel l in the stitched image. For the distance transform, we utilize a weighted approach, applying the distance transform to each z-slice and then applying the Tukey window function across z-slices to address the anisotropic properties of voxel sizes. When all tiles are the same size, we compute the weight matrix for a single tile and apply it across all other tiles in the stitching process to save computing time. The final stitched image is obtained once all the subvolumes are processed.Large-scale processingFor stitching involving large tiles where intermediate steps above exceed memory capacities, including large, stitched subregions, challenges arise in the registration and calculation of the distance transformation for feather blending, due to the need to load large regions or tiles into memory. In such cases, we use MIP slabs for the registration and distance transform. These are computed across all three axes with downsampling factors [Mx, My, Mz, mx, my, mz]. The MIP slab for each specific axis is computed using the major downsampling factor Mi for that axis, and the minor downsampling factors mj and mk for the other two. To enrich the signal for cross-correlation in sparse specimens, we use maximum pooling, that is, taking the max value in the neighborhood for the downsampling. Alternatively, we can also smooth the initial data by linear interpolation before maximum pooling. For the registration, normalized cross-correlation is calculated between direct neighbor tiles using all three MIP slabs, generating three sets of optimal shifts. The optimal shifts from the minor axes are then averaged to obtain the final optimal shifts, with weights assigned based on the squares of the cross-correlation values. For the distance transform, only the MIP slab along the z axis (major axis) is used to compute the weights for feather blending. In the stitching process, for overlapping regions, the downsampled weight regions are upsampled using linear interpolation to match the size of the regions in the stitching. The upsampled weights are then utilized for feather blending, following the same approach as that used for stitching with smaller tiles.For large-scale deskew and rotation, tasks are divided across the y axis based on the size in the x and z axes, with a buffer of one or two pixels on both sides in the y axis. These tasks are then allocated to individual worker jobs for processing, with the results saved as independent Zarr regions on disk. MIP masks can be used to define a tight boundary for the object to optimize efficiency in data reading, processing and writing. We also perform deskewing and rotation for the MIP along the y axis to define the bounding box for the output in the xz axes. The geometric transformation function directly relies on this bounding box to determine the output view to minimize the empty regions, thereby further optimizing processing time, memory and storage requirements.For large-scale deconvolution, tasks are distributed across all three axes, ensuring that regions occupy entire chunk files. An additional buffer size, set to at least half of the PSF size (plus some extra size, 10 by default), is included to eliminate edge artifacts. MIP masks are again used to define a tight specimen boundary to speed computing. In a given task, all three MIP masks for the region are loaded and checked for empty ones. If a mask is empty, deconvolution is skipped, resulting in an output of zeros for that region.Image processing and simulationsAll images were processed using PetaKit5D. Flat-field correction was applied for the large field-of-view cell data (Fig. 5f–h), phase contrast data (Fig. 5c–e) and VNC data (Fig. 6e–j) with either experimentally collected flat-fields or ones estimated based on the data using BaSiC software59.The images used to benchmark different readers and writers, deskew/rotation and deconvolution algorithms were generated by cropping or replicating frames from a uint16 image of size 512 × 1,800 × 3,400. The stripped line patterns used to compare deconvolution methods were simulated using the methodology outlined in ref. 37. The confocal PSF for the given pinhole size used in the stripped line pattern simulation was generated based on the theoretical widefield PSF. We benchmarked large-scale stitching from 1 TiB to 1 PiB using one channel of the VNC dataset with 1,071 tiles, each sized at 320 × 1,800 × 17,001. The datasets were created by either including specific numbers of tiles or replicating tiles across all three axes based on the total data size from 1 TiB to 1 PiB, as specified in Supplementary Table 4. We benchmarked large-scale deconvolution and deskew/rotation using the stitched VNC dataset (15,612 × 28,478 × 21,299, uint16) by either cropping or replicating the data in all three axes to generate the input datasets, as indicated in Supplementary Table 4.Computing infrastructuresOur computing cluster has 38 CPU/GPU computing nodes: 30 CPU nodes (24 nodes with dual Intel Xeon Gold 6146 CPUs, 6 nodes with dual Intel Xeon Gold 6342 CPUs) and 8 GPU nodes (3 nodes with dual Intel Xeon Silver 4210R and 4 NVIDIA Titan V GPUs each, 4 nodes with dual Intel Xeon Gold 6144 and 4 NVIDIA A100 GPUs each, and 1 NVIDIA DGX A100 with dual AMD EPYC 7742 CPUs and 8 NVIDIA A100 GPUs). The Intel Xeon Gold 6146 CPU and GPU nodes have 512 GB RAM on each node, the Intel Xeon Gold 6342 CPU nodes have 1,024 GB RAM on each node, and the NVIDIA DGX A100 has 2 TB RAM. The hyperthreading on all Intel CPUs was disabled. Benchmarks were performed on hardware aged approximately 3 to 4 years. We have four flash data servers, including a 70 TB (SSD, Supermicro), two 300 TB (NVMe, Supermicro) and a 1,000 TB parallel file system (VAST Data). We also accessed the Perlmutter supercomputer from the National Energy Research Scientific Computing Center (NERSC), with both CPU and GPU nodes. Each CPU node is equipped with two AMD EPYC 7713 CPUs and 512 GB RAM; each GPU node has a single AMD EPYC 7713 CPU, four NVIDIA A100 GPUs and 256 or 512 GB RAM.Microscope hardwareLight sheet imaging was performed on a lattice light sheet microscope comparable to a published system55. Two lasers, 488 nm and 560 nm (500 mW, MPB Communications 2RU-VFL-P-500-488-B1R, and 2RU-VFL-P-1000-560-B1R), were used as the light sources. Water immersion excitation (EO, Thorlabs TL20X-MPL) and detection objectives (DO, Zeiss, ×20, 1.0 NA, 1.8 mm FWD, 421452-9800-000) were used for imaging with a sCMOS camera (Hamamatsu ORCA Fusion). The oblique illumination microscopy was also performed on the modified lattice light sheet microscope using a 642-nm laser illuminated through the EO, and imaged using an inverted DO (Zeiss, ×20, 1.0 NA, 1.8 mm FWD, 421452-9880-000). Widefield and confocal imaging were performed on an Andor BC43 Benchtop Confocal Microscope (Oxford Instruments) with a Nikon Plan Apo ×40, 1.25 NA SIL Silicone objective (Nikon, MRD73400), a 488-nm laser (Oxford Instruments, Andor Borealis) and a modified Andor Zyla sCMOS camera (Oxford Instruments, 4.1 MP, 6.5-μm pixel size). Two-photon microscopy was performed on a custom-built microscope equipped with an upright DO (Zeiss, ×20, 1.0 NA, 1.8-mm FWD, 421452-9880-000), pulsed laser (Coherent, Chameleon LS), deformable mirror (ALPAO, DM69) and MPPC modules (Hamamatsu, C13366-3050GA and C14455-3050GA). The imaging conditions for the datasets can be found in Supplementary Table 5.Cell culture and imagingPig kidney epithelial cells (LLC-PK1, a gift from M. Davidson at Florida State University) cells and HeLa cells were cultured in DMEM with GlutaMAX (Gibco, 10566016) supplemented with 10% FBS (Seradigm) in an incubator with 5% CO2 at 37 °C and 100% humidity. LLC-PK1 cells stably expressing the endoplasmic reticulum marker mEmerald-Calnexin and the chromosome marker mCherry-H2B were grown on coverslips (Thorlabs, CG15XH) coated with 200-nm diameter fluorescent beads (Invitrogen FluoSpheres Carboxylate-Modified Microspheres, 505/515 nm, F8811). When cells reached 30–80% confluency, they were imaged at 37 °C in Leibovitz’s L-15 Medium without Phenol Red (Gibco catalog, 21-083-027), with 5% FBS (ATCC SCRR-30- 2020) and an antibiotic cocktail consisting of 0.1% ampicillin (Thermo Fisher, 611770250), 0.1% kanamycin (Thermo Fisher, 11815024) and 0.1% penicillin–streptomycin (Thermo Fisher, 15070063). HeLa cells were cultivated on 25-mm coverslips until approximately 50% confluency was achieved. They were imaged in the same media as above.Mouse brain sample preparation and imagingAll mouse experiments were conducted at Janelia Research Campus, Howard Hughes Medical Institute (HHMI) in accordance with the US National Institutes of Health Guide for the Care and Use of Laboratory Animals. Procedures and protocols were approved by the Institutional Animal Care and Use Committee of the Janelia Research Campus, HHMI. Mice were housed in a specific pathogen-free condition on individually ventilated racks with 100% outside filtered air in the holding room. They were maintained on a 12–12-h light–dark cycle at 20–22 °C with 30–70% relative humidity.Transgenic Thy1-YFP-H mice (The Jackson Laboratory) of 8 weeks or older with cytosolic expression of yellow fluorescent protein (YFP) at high levels in motor, sensory and subsets of central nervous system neurons were anesthetized with isoflurane (1–2% by volume in oxygen) and placed on a heated blanket. An incision was made on the scalp followed by removing of the exposed skull. A cranial window made of a single 170-μm-thick coverslip was embedded in the craniotomy. The cranial window and a headbar were sealed in place with dental cement for subsequent imaging. A direct wavefront sensing method60 was used for adaptive optical correction before image acquisition. Aberrations at each volumetric tile were independently measured and corrected using a pupil conjugated deformable mirror, and imaged at 16 Hz using Hamamatsu MPPC modules.Fly VNC sample preparation and imagingA genetically modified strain of fruit flies (Drosophila melanogaster) was raised on a standard cornmeal-agar-based medium in a controlled environment of 25 °C on a 12–12-h light–dark cycle. On the day of eclosion, female flies were collected and group housed for 4–6 days. The genotype was VGlutMI04979-LexA:QFAD/MN-GAL4 (attp40); 13XLexAop-Syn21-mScarle [JK65C], 20XUAS-Syn21-GFP [attp2]/MN-GAL4 [attp2]61,62. Dissection and immunohistochemistry of the fly VNC were performed following the protocol in ref. 63 with minor modifications. The primary antibodies were chicken anti-GFP (1:1,000 dilution; Abcam, ab13970) and rabbit anti-dsRed (1:1,000 dilution; Takara Bio, 632496). The secondary antibodies were goat anti-chicken IgY Alexa Fluor 488 (1:500 dilution; Invitrogen, A11039) and goat anti-rabbit IgG Alexa Fluor 568 (1:500 dilution; Invitrogen, A11011). VNC samples were prepared for 8× expansion as described in ref. 63. The imaging protocol for the expanded VNC sample was identical to that described in ref. 12.Visualization and softwareLattice light sheet images were acquired with LabView (National Instruments) software. Videos were made with Imaris (Oxford Instruments), Fiji33, Amira (Fisher Scientific), NVIDIA IndeX (NVIDIA) and MATLAB R2023a (MathWorks) software. Figures were made with MATLAB R2023a (MathWorks). Python (3.8.8) with Zarr-Python (2.16.1), tifffile (2023.7.10), TensorStore (0.1.45), pyclesperanto-prototype (0.24.2), qi2lab-OPM (a734490) and clij2-fft (0.26) libraries were used for benchmarking image readers and writers, deskew and rotation and deconvolution. The traditional RL deconvolution method is an accelerated version of the original RL algorithm40,64. It was implemented and adapted from MATLAB’s ‘deconvlucy.m’ with enhancements such as GPU computing and customized parameters. Backward projectors for the WB deconvolution method were generated using the code from https://github.com/eguomin/regDeconProject/. Spark versions of BigStitcher (https://github.com/JaneliaSciComp/BigStitcher-Spark/) and https://github.com/saalfeldlab/stitching-spark/ were used for the stitching comparison. NVIDIA IndeX can be obtained from https://developer.nvidia.com/index/ with a free license for noncommercial research and education.Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Image processing tools for petabyte-scale light sheet microscopy data

Same but different | Science

Securing the global antimony supply chain | Science

BioNumPy: array programming for biology

Antimony’s environmental impact in China | Science

Modelling the spatial variability and uncertainty for under-vaccination and zero-dose children in fragile settings

Hot Topics

Same but different | Science

Securing the global antimony supply chain | Science

BioNumPy: array programming for biology

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

Same but different | Science

Securing the global antimony supply chain | Science

BioNumPy: array programming for biology

Antimony’s environmental impact in China | Science

Popular Articles

Same but different | Science

Securing the global antimony supply chain | Science

BioNumPy: array programming for biology