Flask applications can handle file uploads and store them in Amazon S3 with a few key configurations.
Here’s how a file upload might look in a Flask app, processing a file named my_document.pdf:
from flask import Flask, request, redirect, url_for, render_template
import boto3
from botocore.exceptions import NoCredentialsError
app = Flask(__name__)
# S3 Configuration
S3_BUCKET = 'your-unique-bucket-name'
S3_KEY = 'your-s3-access-key' # Consider using environment variables or IAM roles
S3_SECRET = 'your-s3-secret-key' # Consider using environment variables or IAM roles
s3 = boto3.client(
's3',
aws_access_key_id=S3_KEY,
aws_secret_access_key=S3_SECRET
)
@app.route('/')
def index():
return render_template('upload.html')
@app.route('/upload', methods=['POST'])
def upload_file():
if 'file' not in request.files:
return "No file part in the request."
file = request.files['file']
if file.filename == '':
return "No selected file."
if file:
try:
file_content = file.read()
file_extension = file.filename.rsplit('.', 1)[1].lower()
# Construct a unique filename, e.g., using timestamp or UUID
# For simplicity here, we'll use the original filename, but this is NOT recommended for production
s3_filename = f"uploads/{file.filename}"
s3.put_object(
Bucket=S3_BUCKET,
Key=s3_filename,
Body=file_content,
ContentType=file.content_type
)
return f"File '{file.filename}' uploaded successfully to S3 bucket '{S3_BUCKET}' as '{s3_filename}'."
except NoCredentialsError:
return "AWS credentials not found. Please configure your credentials."
except Exception as e:
return f"An error occurred: {str(e)}"
if __name__ == '__main__':
app.run(debug=True)
And the corresponding upload.html template:
<!doctype html>
<html>
<head>
<title>Upload File to S3</title>
</head>
<body>
<h1>Upload a File</h1>
<form action="{{ url_for('upload_file') }}" method="post" enctype="multipart/form-data">
<input type="file" name="file">
<input type="submit" value="Upload">
</form>
</body>
</html>
The core problem this solves is decoupling file storage from your application server. Instead of filling up your server’s disk, you offload the storage burden to a scalable, durable, and cost-effective object storage service like Amazon S3. This improves application performance, reduces operational overhead for managing storage, and provides built-in redundancy.
Internally, boto3, the AWS SDK for Python, handles the heavy lifting. When s3.put_object is called, boto3 establishes a connection to S3 using the provided credentials. It then reads the file_content (which is the raw bytes of the uploaded file) and uploads it as the Body to the specified Bucket and Key. The Key is essentially the path and filename within your S3 bucket. ContentType is crucial for S3 to serve the file with the correct MIME type.
The levers you control are primarily:
S3_BUCKET: The name of your S3 bucket. This must be globally unique.S3_KEYandS3_SECRET: Your AWS access key ID and secret access key. For production, it’s highly recommended to use IAM roles attached to your EC2 instance or Lambda function, or to use environment variables (AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY).s3_filename: This is the path and name of the file as it will appear in S3. You have complete control over this. Generating unique filenames (e.g., usinguuid.uuid4()or timestamps) is essential to prevent overwrites and ensure proper organization.ContentType: Whileboto3can often infer this from the file object, explicitly setting it ensures correct behavior, especially for less common file types.
A common pitfall is how you handle the file object itself. request.files['file'] is a Werkzeug FileStorage object. Calling .read() on it consumes the file’s contents, making it unavailable for subsequent operations on the same file object. If you needed to both upload to S3 and process the file in memory (e.g., for validation or resizing), you’d need to read it once into a variable (file_content = file.read()) and then use that variable for all subsequent operations. If you needed to upload it again or save it locally, you’d have to re-open the file or use a temporary file.
The most surprising truth is that boto3 doesn’t just upload a single file in one go for large files; it intelligently handles multipart uploads behind the scenes. For files larger than a certain threshold (typically 8MB), boto3 automatically breaks the file into parts, uploads them in parallel, and then completes the multipart upload. This is a significant performance optimization that happens transparently, so you don’t need to manage chunking yourself.
The next step is often implementing security for these uploaded files, such as restricting access to only authenticated users or preventing direct public access.