Parquet Viewer: Complete Feature Guide & Reference
🗜️ Open the Parquet Viewer to explore every feature described in this guide.
Open Parquet Viewer →Contents
What Is the Parquet Viewer?
The FinancialDataTools.com Parquet Viewer is a free, browser-based tool for opening and exploring Apache Parquet files. It uses DuckDB-Wasm — the official WebAssembly build of the DuckDB analytical database — to read Parquet files natively and efficiently, entirely inside your browser. No file is ever transmitted to any server.
Parquet is a columnar storage format widely used in data engineering, machine learning, and financial data pipelines. Before the Parquet Viewer, inspecting a Parquet file required Python, Spark, or a data warehouse connection. Now you can open any Parquet file in seconds, directly in your browser.
Try the Parquet Viewer — runs entirely in your browser and never uploads your files.
Open the Parquet Viewer →How DuckDB-Wasm Reads Parquet
When you open a Parquet file in the viewer, the following steps happen entirely inside your browser:
- DuckDB-Wasm is initialized in a Web Worker (keeping the UI responsive during loading)
- Your file is registered in DuckDB-Wasm's in-memory virtual file system using
registerFileBuffer - Column metadata is read using DuckDB's
DESCRIBE SELECT * FROM read_parquet('file.parquet') LIMIT 0— this reads only the Parquet file footer metadata, not the row data, so it's fast even for large files - The total row count is retrieved with a
SELECT COUNT(*)query - Row data is fetched with
SELECT * FROM read_parquet('file.parquet') LIMIT 5000
DuckDB's Parquet reader handles all the columnar decoding, compression decompression, and type conversion automatically. The viewer then converts DuckDB's Apache Arrow result format to JavaScript arrays for display.
The viewer selects the optimal DuckDB-Wasm bundle automatically: the eh (exception handling) bundle for browsers that support it, or the mvp bundle as a fallback.
Supported Parquet Versions and Codecs
DuckDB's Parquet reader supports all standard Parquet versions and compression codecs:
| Parquet Version | Support |
|---|---|
| Parquet 1.0 | Fully supported |
| Parquet 2.x | Fully supported, including all 2.x encodings |
| Compression Codec | Support |
|---|---|
| Uncompressed | ✓ |
| Snappy | ✓ (most common default) |
| Gzip | ✓ |
| LZ4 | ✓ |
| Zstandard (Zstd) | ✓ |
| Brotli | ✓ |
| LZO | Limited (rarely used in practice) |
Supported Data Types
DuckDB maps all standard Parquet physical and logical types to its own type system. The viewer infers a display type from the DuckDB type and shows a color-coded badge on each column header:
| Badge | DuckDB Types | Display Behavior |
|---|---|---|
| INT | TINYINT, SMALLINT, INTEGER, BIGINT, HUGEINT, UBIGINT, UINTEGER | Right-aligned in blue; bigint values converted to JavaScript numbers |
| FLOAT | FLOAT, DOUBLE, REAL, DECIMAL, NUMERIC | Right-aligned in amber |
| BOOL | BOOLEAN | Displayed as true/false in purple |
| DATE | DATE, TIME, TIMESTAMP, INTERVAL | ISO 8601 formatted; timestamps shown as YYYY-MM-DD HH:MM:SS |
| BLOB | BLOB, BINARY, BYTEA | Shown in pink; binary values not displayed as text |
| TEXT | VARCHAR, TEXT, and all other types | Default string display; green badge |
Nested Parquet types (LIST, MAP, STRUCT) are automatically serialized to their JSON string representation for display in the grid. Click any cell containing a nested value to see the full pretty-printed JSON in the Cell Detail Panel.
The Toolbar
| Button | Function |
|---|---|
| Open File | Opens a system file picker to select your .parquet file |
| Schema | Opens the column schema modal showing column names and DuckDB types |
| Export | Opens the export dialog |
| File name display | Shows the currently loaded file name |
| Search box | Global text search across all visible columns |
You can also drag and drop a Parquet file anywhere on the viewer to open it without using the Open button. Multi-step loading feedback is shown during initialization: Initialising DuckDB-Wasm… → Reading Parquet metadata… → Loading rows…
Stats Bar
- Rows: Total number of rows in the Parquet file (from
COUNT(*)) - Showing: Number of rows visible after applying filters
- Cols: Number of columns in the schema
- Engine: Always shows
DuckDB-Wasmto confirm the reading engine - Filter badge: Pink badge showing active column filter count; click to clear all
Sorting Columns
Click any column header to sort ascending, click again for descending, and a third click returns to the original row order. Sorting is applied client-side to the currently loaded page of rows. For globally-sorted results across an entire large file, use the DuckDB Viewer which supports ORDER BY queries via its SQL panel.
Row Filtering
Click the filter icon in any column header to open the column filter panel. Two modes are available:
- Values mode: A checklist of all distinct values in that column, sampled from the currently loaded page. Uncheck values to hide matching rows.
- Conditions mode: Apply up to two conditions using operators: contains, equals, does not equal, begins with, ends with, greater than, less than, is empty/null, is not empty/null, and more. Combine two conditions with AND or OR logic.
Column filters operate on the currently loaded page of rows. For a file with millions of rows, only the first 5,000 rows (or the current page) are in memory for filtering. Use the DuckDB Viewer with a WHERE clause for server-side filtering of very large Parquet files.
Global Search
The toolbar search input performs a real-time text search across all visible columns. Rows where no column contains the search term are hidden. Global search operates on the currently loaded page and stacks with active column filters.
Cell Detail Panel
Clicking any cell opens the Cell Detail Panel on the right side of the viewer. This panel shows the row number, column name, DuckDB type (with full type string like DECIMAL(18,6)), character length, and the full cell value. Nested values (LIST, MAP, STRUCT) are automatically pretty-printed as JSON for easy inspection. A Copy value button copies the raw value to the clipboard.
Schema Inspector
Click the Schema button to open the schema modal. For each column it shows:
- Column name — as stored in the Parquet file metadata
- Type — the full DuckDB type string (e.g.,
DECIMAL(18,6),TIMESTAMP WITH TIME ZONE,VARCHAR)
The Copy Schema button copies the complete column list as plain text. This is particularly useful when setting up a target database table or documenting a Parquet schema for a data contract.
Pagination
Parquet files with more than 50,000 rows are automatically paginated to 5,000 rows per page. DuckDB-Wasm fetches each page using SELECT * FROM read_parquet('file') LIMIT 5000 OFFSET N, loading only the rows being displayed into browser memory. This makes the viewer memory-efficient even for files with millions of rows.
The page bar shows the current page, total pages, and absolute row range. When search or column filters are active, the filtered row count is used for pagination and page navigation.
Export Options
Click Export in the toolbar to open the export dialog. Four formats and two scopes are available:
| Format | Best For | Notes |
|---|---|---|
| CSV | Spreadsheets, Python/pandas, data pipelines | UTF-8; NULL as empty string; nested types as JSON strings |
| JSON | APIs, downstream processing | Array of objects; column names as keys; null preserved |
| Excel (.xlsx) | Sharing with stakeholders | Frozen header row; auto-sized columns; attribution sheet |
| TSV | Tab-separated import targets | Useful when values contain commas |
Two export scopes:
- Filtered view: Exports only the rows visible after applying search and column filters (from the currently loaded page)
- Full file: Issues a
SELECT * FROM read_parquet('file')query to DuckDB-Wasm and exports all rows, bypassing any active filters and page limits. For very large files this may take a moment.
Privacy & Security
The Parquet Viewer processes your file entirely inside your browser via DuckDB-Wasm. Your file is registered in an in-memory virtual file system and never transmitted to any server. DuckDB-Wasm reads and decompresses the Parquet format locally within the WebAssembly runtime.
Network requests from the viewer are limited to loading the DuckDB-Wasm runtime from jsDelivr CDN and ExcelJS for Excel export. No row data, schema information, or file contents leave your browser.
This makes the viewer appropriate for sensitive financial Parquet files including:
- Trade and position data from internal data warehouses
- Financial model outputs and backtesting results
- Market data snapshots from data vendor pipelines
- Client portfolio data exported from analytics platforms
Closing the browser tab clears all data from memory immediately. No data is written to localStorage or any persistent browser storage.
Use Cases for Financial Data
Parquet has become the dominant columnar storage format for financial data engineering. Common scenarios where the viewer adds immediate value:
- Data pipeline validation: Parquet files produced by Spark, Flink, or dbt pipelines can be quickly inspected to verify schema, row counts, and sample values before deploying a pipeline to production.
- Python and pandas workflows: Financial analysts who save DataFrames to Parquet using
df.to_parquet()can inspect those files without opening a Jupyter notebook. - Market data archives: Tick data, OHLCV price history, and order book snapshots stored in Parquet by data vendors or internal pipelines can be browsed and sampled without loading them into a DataFrame.
- Feature stores and ML pipelines: Parquet files output by feature engineering pipelines for financial ML models can be inspected to verify feature distributions, null rates, and value ranges.
- Data warehouse exports: Tables exported from Snowflake, BigQuery, Redshift, or Databricks in Parquet format can be opened directly without re-uploading to a warehouse or running another query.
- DuckDB companion: If you use DuckDB Viewer for .duckdb files, the Parquet Viewer handles standalone Parquet exports from those same workflows — the two viewers share the same DuckDB-Wasm engine and complement each other for different stages of your data work.
