Parqify

Parqify provides a server application, packaged as an AMI, that customers can deploy within their AWS environment.

This server will be responsible for consuming CSV and JSON files from a specified S3 bucket, converting them to the Parquet format, and then writing the converted files back to another S3 bucket.

Key Features of Parqify

Data Flow

  1. Customer places files: CSV or JSON files are uploaded by the customer to a designated input S3 bucket.
  2. Server monitors: The server application, running on an EC2 instance launched from the AMI, continuously monitors the input S3 bucket for new files.
  3. File download: When a new file is detected, the server application downloads it from the input S3 bucket.
  4. File conversion: The server application converts the downloaded file from CSV or JSON format to Parquet format. During this step, custom schema definitions, partitioning, and compression options can be applied.
  5. Parquet file upload: The newly converted Parquet file is then uploaded to a designated output S3 bucket.
Parqify schema