How to Open & Browse a Parquet File: Step-by-Step Tutorial
🗜️ Open the Parquet Viewer and follow along with this tutorial.
Open Tool →Steps
This tutorial walks you through opening and exploring an Apache Parquet file using the free FinancialDataTools.com Parquet Viewer. The tool uses DuckDB-Wasm — the official WebAssembly build of the DuckDB analytical engine — to read Parquet natively in your browser. Nothing is sent to any server.
Try the Parquet Viewer — runs entirely in your browser and never uploads your files.
Open the Parquet Viewer →Step 1: Locate Your Parquet File
Find the .parquet file you want to inspect. Parquet is the dominant columnar storage format for analytical data and appears in many financial data engineering workflows:
- DataFrames saved with
df.to_parquet('file.parquet')in Python/pandas - Output files from Apache Spark, Flink, or dbt data pipelines
- Market data archives from financial data vendors (tick data, OHLCV history)
- Feature store outputs from financial ML pipelines
- Tables exported from Snowflake, BigQuery, Redshift, or Databricks in Parquet format
- Backtest result datasets stored in Parquet by algorithmic trading frameworks
The viewer supports all standard Parquet versions (1.0 and 2.x) and compression codecs (Snappy, Gzip, LZ4, Zstandard, and uncompressed).
Step 2: Open the Parquet Viewer
Navigate to financialdatatools.com/viewers/parquet-viewer/ in any modern desktop browser. No login, account, or installation is required. The viewer works best on desktop.
Step 3: Load Your File
There are two ways to open your Parquet file:
- Click the green "Open File" button in the toolbar and select your
.parquetfile. - Drag and drop your file anywhere onto the viewer window.
Step 4: Understand the Loading Process
Unlike CSV or JSON, Parquet is a binary format that requires a database engine to read. The viewer uses DuckDB-Wasm, which initializes in a background Web Worker. Loading happens in three stages shown in the status indicator:
- Initialising DuckDB-Wasm — the DuckDB engine loads into the browser (first load only; subsequent files use the cached engine)
- Reading Parquet metadata — DuckDB reads the Parquet file footer to get column names, types, and row count without loading any row data yet
- Loading rows — the first page of up to 5,000 rows is fetched
For most Parquet files this entire process completes within a few seconds. Files with many row groups or complex nested schemas may take slightly longer during the metadata step.
Once loaded, the stats bar shows the total row count (from COUNT(*)), visible rows, column count, and the engine label DuckDB-Wasm.
Step 5: Browse Columns and Rows
Your data appears in a spreadsheet-style grid. Each column header shows:
- The column name as stored in the Parquet file metadata
- A type badge showing the DuckDB type category — INT, FLOAT, TEXT, BOOL, DATE, or BLOB
- A tooltip with the full type string (e.g.,
DECIMAL(18,6),TIMESTAMP WITH TIME ZONE) when you hover over the badge - A filter button for column-level filtering
Numeric columns (INT, FLOAT) are right-aligned and shown in blue. Boolean values appear in purple as true/false. Date and timestamp values are shown in ISO 8601 format (YYYY-MM-DD or YYYY-MM-DD HH:MM:SS). Nested types (LIST, MAP, STRUCT) are shown as their JSON string representation.
Row numbers appear on the left side of the grid. For paginated files, row numbers reflect absolute positions across the entire file.
Step 6: Sort and Filter Data
Sorting: Click any column header to sort the current page ascending. Click again for descending; click a third time to restore original row order.
Global search: Type in the search box in the toolbar to search across all visible columns simultaneously. Rows not containing the search term in any column are hidden.
Column filters: Click the filter icon in any column header for column-specific filtering:
- Values mode: A checklist of all distinct values in that column (sampled from the current page). Uncheck values to hide matching rows.
- Conditions mode: Apply conditions such as "contains", "equals", "greater than", or "is empty". Combine two conditions with AND or OR logic.
Column filters operate on the currently loaded page. For filtering very large Parquet files across all rows, consider using the DuckDB Viewer where you can load the Parquet file via DuckDB's read_parquet() function and apply SQL WHERE clauses for server-side filtering.
Step 7: Inspect Cell Values
Click any cell to open the Cell Detail Panel on the right side of the viewer. This panel shows the row number, column name, full DuckDB type string (e.g., DECIMAL(18,6)), character length, and the full cell value without truncation.
For nested Parquet types (LIST, STRUCT, MAP), the cell value is a JSON string — the detail panel automatically pretty-prints it as formatted JSON so you can read complex nested values clearly. Use the Copy value button to copy the raw value to the clipboard.
Step 8: Inspect the Schema
Click the Schema button in the toolbar to open the column schema modal. For each column it shows the column name and full DuckDB type string, derived from a DESCRIBE query against the Parquet file.
This is particularly useful when you need to:
- Confirm the precision of DECIMAL columns before importing into a database
- Identify TIMESTAMP columns and their timezone information
- Document the schema for a data contract or pipeline specification
- Set up a target table with matching column types for a data migration
Use the Copy Schema button to copy the full column list as plain text.
Step 9: Export Your Data
Click the Export button in the toolbar to open the export dialog. Four formats are available:
- CSV — flat comma-separated file; nested types as JSON strings; NULL as empty string
- JSON — array of objects with column names as keys; useful for downstream processing
- Excel (.xlsx) — workbook with frozen header row, auto-sized columns, and attribution sheet
- TSV — tab-separated; useful when values contain commas
Two export scopes let you control what gets exported:
- Filtered view — exports only the rows on the current page that are visible after applying your search and column filters
- Full file — issues a
SELECT * FROM read_parquet('file')query to DuckDB-Wasm and exports all rows from the entire file, bypassing any page limits or filters. Note: for very large files this may take a moment and use significant browser memory.
Tip: Use Full file CSV export to quickly convert a Parquet file to CSV without writing any Python or using any command-line tools — all processing happens in your browser.
