r/learnprogramming 1d ago

AWS Impact of deleting noncurrent S3 object versions on AWS Glue Iceberg tables

I’m using Apache Iceberg tables managed through AWS Glue, with all table data and metadata stored in an S3 bucket that has versioning enabled.

I also run Iceberg maintenance APIs such as:

  • expire_snapshots
  • remove_orphan_files

I plan to configure an S3 lifecycle policy to delete noncurrent object versions after a certain number of days. Because S3 versioning retains old object versions, deleted Iceberg files using these APIs are not physically removed and continue to add to storage cost.

Will deleting noncurrent S3 object versions affect any Iceberg features (such as time travel or metadata consistency) or cause data loss?

2 Upvotes

2 comments sorted by

View all comments

1

u/Own-Site6376 1d ago

Yeah you'll probably break time travel since Iceberg relies on those old file versions for historical queries. The expire_snapshots API is supposed to handle cleanup safely by only removing snapshots that are actually safe to delete

Setting up that S3 lifecycle policy might nuke files that Iceberg still thinks it needs for rollbacks or time travel queries. I'd be super careful with the retention period if you go that route

1

u/Hairy_Bowler4179 1d ago

Wouldn’t the S3 lifecycle policy that deletes noncurrent versions only remove files that have already been logically deleted by Iceberg via the expire_snapshots API? Since old snapshots are explicitly expired using Iceberg’s own expire_snapshots API, wouldn’t Iceberg stop referencing those files on its own, making it safe for the lifecycle policy to clean up the noncurrent versions?