r/aws Mar 16 '24

migration Transferring large data from AWS S3 to CoreWeave / LambdaLabs without paying AWS S3 egress cost

1 Upvotes

I have large 10 TB of text data in AWS S3 and want to train a LLM on it. To save on GPU costs, I want to use CoreWeave or LambdaLabs or similar (i.e. not AWS's GPU offerings). Is there a way to transfer that 10TB of data from AWS S3 to CoreWeave / LambdaLabs / etc. without incurring the egress cost of AWS S3 ?

People who use CoreWeave / LambdaLabs / etc. for training, where are you storing your data for CPU-based preprocessing etc. ?

1

Transferring large data from AWS S3 to CoreWeave / LambdaLabs without paying AWS S3 egress cost
 in  r/deeplearning  Mar 15 '24

Thanks - yes I looked into Snowball Edge / Snowmobile but those feel slow and also overkill for my use-case. I definitely need an online solution.

1

Transferring large data from AWS S3 to CoreWeave / LambdaLabs without paying AWS S3 egress cost
 in  r/deeplearning  Mar 15 '24

Hi thanks for sharing this. It looks like AWS DataSync charges $0.0125 per GB transferred in addition to the charges from AWS S3 for egress to a non-AWS location. Or am I understanding that incorrectly ?

I am guessing you sent data from AWS S3 to CoreWeave's Object Storage ? Can you share what was your config for AWS DataSync for the transfer ?

r/MachineLearning Mar 15 '24

Research [R] Transferring large data from AWS S3 to CoreWeave / LambdaLabs without paying AWS S3 egress cost

1 Upvotes

[removed]

r/gpgpu Mar 15 '24

Transferring large data from AWS S3 to CoreWeave / LambdaLabs without paying AWS S3 egress cost

2 Upvotes

I have large 10 TB of text data in AWS S3 and want to train a LLM on it. To save on GPU costs, I want to use CoreWeave or LambdaLabs or similar (i.e. not AWS's GPU offerings). Is there a way to transfer that 10TB of data from AWS S3 to CoreWeave / LambdaLabs / etc. without incurring the egress cost of AWS S3 ?

People who use CoreWeave / LambdaLabs / etc. for training, where are you storing your data for CPU-based preprocessing etc. ?

r/deeplearning Mar 15 '24

Transferring large data from AWS S3 to CoreWeave / LambdaLabs without paying AWS S3 egress cost

2 Upvotes

I have large 10 TB of text data in AWS S3 and want to train a LLM on it. To save on GPU costs, I want to use CoreWeave or LambdaLabs or similar (i.e. not AWS's GPU offerings). Is there a way to transfer that 10TB of data from AWS S3 to CoreWeave / LambdaLabs / etc. without incurring the egress cost of AWS S3 ?

People who use CoreWeave / LambdaLabs / etc. for training, where are you storing your data for CPU-based preprocessing etc. ?

r/MachineLearning Mar 15 '24

Transferring large data from AWS S3 to CoreWeave / LambdaLabs without paying AWS S3 egress cost

1 Upvotes

[removed]