DuckDB

DuckDB database is an in-process SQL OLAP database management system.

Like SQLite, it is particulary usefull for :

  • Processing and storing tabular datasets, e.g. from CSV or Parquet files
  • Interactive data analysis, e.g. Joining & aggregate multiple large tables

Embed

TODO: Explain how a single or multiple DuckDB files could be attached to a signed PDF file

Convert

TODO: Explain how a user can drop CSV, Excel, FEC files and have them converted to a single DuckDB database file.

Attach

The PDF document may contain a link to the database, and not the database file itself (thanks to the attach command). A hash should be computed and stored in the PDF document to ensure data inegrity.

Multiple sources can be linked/attached and processed as a single database.

Multiple format can then be supported out of the box :

  • sqlite
  • CSV
  • Microsoft Excel
  • OpenDocument
  • ...

Hash

We need a canonical form of a DuckDB database file to compute a hash.

See this DuckDB discussion.

Diff

TODO: link to DeltaV module

Query

To query the database from a checklist we can use :

  • DuckDB SQL statments
  • polars Python script
  • pandas Python script