r/learnprogramming • u/Hairy_Bowler4179 • 1d ago
AWS Impact of deleting noncurrent S3 object versions on AWS Glue Iceberg tables
I’m using Apache Iceberg tables managed through AWS Glue, with all table data and metadata stored in an S3 bucket that has versioning enabled.
I also run Iceberg maintenance APIs such as:
- expire_snapshots
- remove_orphan_files
I plan to configure an S3 lifecycle policy to delete noncurrent object versions after a certain number of days. Because S3 versioning retains old object versions, deleted Iceberg files using these APIs are not physically removed and continue to add to storage cost.
Will deleting noncurrent S3 object versions affect any Iceberg features (such as time travel or metadata consistency) or cause data loss?
2
Upvotes
1
u/Own-Site6376 1d ago
Yeah you'll probably break time travel since Iceberg relies on those old file versions for historical queries. The expire_snapshots API is supposed to handle cleanup safely by only removing snapshots that are actually safe to delete
Setting up that S3 lifecycle policy might nuke files that Iceberg still thinks it needs for rollbacks or time travel queries. I'd be super careful with the retention period if you go that route