Tutorial

How to Open & Browse a Parquet File: Step-by-Step Tutorial

By FinancialDataTools.com Team · March 2026 · 8 min read · Last updated March 14, 2026

🗜️ Open the Parquet Viewer and follow along with this tutorial.

Steps

Locate Your Parquet File
Open the Parquet Viewer
Load Your File
Understand the Loading Process
Browse Columns and Rows
Sort and Filter Data
Inspect Cell Values
Inspect the Schema
Export Your Data

This tutorial walks you through opening and exploring an Apache Parquet file using the free FinancialDataTools.com Parquet Viewer. The tool uses DuckDB-Wasm — the official WebAssembly build of the DuckDB analytical engine — to read Parquet natively in your browser. Nothing is sent to any server.

Try the Parquet Viewer — runs entirely in your browser and never uploads your files.

Open the Parquet Viewer →

Step 1: Locate Your Parquet File

Find the .parquet file you want to inspect. Parquet is the dominant columnar storage format for analytical data and appears in many financial data engineering workflows:

DataFrames saved with df.to_parquet('file.parquet') in Python/pandas
Output files from Apache Spark, Flink, or dbt data pipelines
Market data archives from financial data vendors (tick data, OHLCV history)
Feature store outputs from financial ML pipelines
Tables exported from Snowflake, BigQuery, Redshift, or Databricks in Parquet format
Backtest result datasets stored in Parquet by algorithmic trading frameworks

The viewer supports all standard Parquet versions (1.0 and 2.x) and compression codecs (Snappy, Gzip, LZ4, Zstandard, and uncompressed).

Step 2: Open the Parquet Viewer

Navigate to financialdatatools.com/viewers/parquet-viewer/ in any modern desktop browser. No login, account, or installation is required. The viewer works best on desktop.

Step 3: Load Your File

There are two ways to open your Parquet file:

Click the green "Open File" button in the toolbar and select your .parquet file.
Drag and drop your file anywhere onto the viewer window.

Step 4: Understand the Loading Process

Unlike CSV or JSON, Parquet is a binary format that requires a database engine to read. The viewer uses DuckDB-Wasm, which initializes in a background Web Worker. Loading happens in three stages shown in the status indicator:

Initialising DuckDB-Wasm — the DuckDB engine loads into the browser (first load only; subsequent files use the cached engine)
Reading Parquet metadata — DuckDB reads the Parquet file footer to get column names, types, and row count without loading any row data yet
Loading rows — the first page of up to 5,000 rows is fetched

For most Parquet files this entire process completes within a few seconds. Files with many row groups or complex nested schemas may take slightly longer during the metadata step.

Once loaded, the stats bar shows the total row count (from COUNT(*)), visible rows, column count, and the engine label DuckDB-Wasm.

Step 5: Browse Columns and Rows

Your data appears in a spreadsheet-style grid. Each column header shows:

The column name as stored in the Parquet file metadata
A type badge showing the DuckDB type category — INT, FLOAT, TEXT, BOOL, DATE, or BLOB
A tooltip with the full type string (e.g., DECIMAL(18,6), TIMESTAMP WITH TIME ZONE) when you hover over the badge
A filter button for column-level filtering

Numeric columns (INT, FLOAT) are right-aligned and shown in blue. Boolean values appear in purple as true/false. Date and timestamp values are shown in ISO 8601 format (YYYY-MM-DD or YYYY-MM-DD HH:MM:SS). Nested types (LIST, MAP, STRUCT) are shown as their JSON string representation.

Row numbers appear on the left side of the grid. For paginated files, row numbers reflect absolute positions across the entire file.

Step 6: Sort and Filter Data

Sorting: Click any column header to sort the current page ascending. Click again for descending; click a third time to restore original row order.

Global search: Type in the search box in the toolbar to search across all visible columns simultaneously. Rows not containing the search term in any column are hidden.

Column filters: Click the filter icon in any column header for column-specific filtering:

Values mode: A checklist of all distinct values in that column (sampled from the current page). Uncheck values to hide matching rows.
Conditions mode: Apply conditions such as "contains", "equals", "greater than", or "is empty". Combine two conditions with AND or OR logic.

Column filters operate on the currently loaded page. For filtering very large Parquet files across all rows, consider using the DuckDB Viewer where you can load the Parquet file via DuckDB's read_parquet() function and apply SQL WHERE clauses for server-side filtering.

Step 7: Inspect Cell Values

Click any cell to open the Cell Detail Panel on the right side of the viewer. This panel shows the row number, column name, full DuckDB type string (e.g., DECIMAL(18,6)), character length, and the full cell value without truncation.

For nested Parquet types (LIST, STRUCT, MAP), the cell value is a JSON string — the detail panel automatically pretty-prints it as formatted JSON so you can read complex nested values clearly. Use the Copy value button to copy the raw value to the clipboard.

Step 8: Inspect the Schema

Click the Schema button in the toolbar to open the column schema modal. For each column it shows the column name and full DuckDB type string, derived from a DESCRIBE query against the Parquet file.

This is particularly useful when you need to:

Confirm the precision of DECIMAL columns before importing into a database
Identify TIMESTAMP columns and their timezone information
Document the schema for a data contract or pipeline specification
Set up a target table with matching column types for a data migration

Use the Copy Schema button to copy the full column list as plain text.

Step 9: Export Your Data

Click the Export button in the toolbar to open the export dialog. Four formats are available:

CSV — flat comma-separated file; nested types as JSON strings; NULL as empty string
JSON — array of objects with column names as keys; useful for downstream processing
Excel (.xlsx) — workbook with frozen header row, auto-sized columns, and attribution sheet
TSV — tab-separated; useful when values contain commas

Two export scopes let you control what gets exported:

Filtered view — exports only the rows on the current page that are visible after applying your search and column filters
Full file — issues a SELECT * FROM read_parquet('file') query to DuckDB-Wasm and exports all rows from the entire file, bypassing any page limits or filters. Note: for very large files this may take a moment and use significant browser memory.

Tip: Use Full file CSV export to quickly convert a Parquet file to CSV without writing any Python or using any command-line tools — all processing happens in your browser.