Speeding up S3 URL generation in ruby

Pingback: Access patterns for S3-hosted resources, with access control – Bibliographic Wilderness

2020-09-02T18:14:20-04:00

Yes, there are optimisations one can apply to make it go even faster. Much faster. We’ll write up our discoveries as they are mostly in line with what this post mentioned – but if you want a quick shortcut you can just grab the library we have released :-) https://github.com/WeTransfer/wt_s3_signer

2020-09-02T18:18:34-04:00

Oh wow, thanks so much for that comment, and the library! I had identified some paths to optimizing it, but there was a lot more work to do — and you’ve just done it and shared it, that’s so great, and so glad I found that out before trying to reinvent it!

2020-09-02T18:19:32-04:00

Julik, your README doesn’t have any examples of adding additional headers like eg content_disposition. Does it support that? Could it?

2020-09-02T18:45:40-04:00

We didn’t implement that but it certainly could, could you make a PR for that? The key part is adding them in the constructor so that they can be cached across the generation calls.

By the way – what’s interesting is that in our profiling the URL escaping didn’t paint as visibly – by far. Most likely because our object keys are all URL-safe.

2020-09-02T18:52:41-04:00

Hm, I need different headers for different URLs though. I set content-disposition to the include desired “Save as” filename for each URL, which is different for different ones. So setting in constructor and caching does not meet my needs. There may be some other things I need slightly different than you have done too — we realize that cutting out the flexibility is *part* of what gets you the performance. I’d fork your code to do what I need… but I realize the license you use isn’t compatible with most of my projects either. :( Well, I’m not at the point where I’m ready for this *quite* yet either, I guess I’ll think on it and see when I get there!

2020-09-02T18:54:22-04:00

The URL escaping showed up for me mostly in *public* URL generating; maybe it gets lost in the overhead of actual presigned URL generating. That they are all already URL-safe shouldn’t make much difference; my test data was too, it still takes time for the escaping code to check every byte to see if it needs escaping.

Pingback: More benchmarking optimized S3 presigned_url generation – Bibliographic Wilderness

2020-09-03T15:49:13-04:00

Julik, I wrote up a bit about and benchmarked your gem too! https://wordpress.com/block-editor/post/bibwild.wordpress.com/8570

Pingback: faster_s3_url: Optimized S3 url generation in ruby – Bibliographic Wilderness

	def naive_public_url(shrine_file)
	"https://#{["#{shrine_file.storage.bucket.name}.s3.amazonaws.com", *shrine_file.storage.prefix, shrine_file.id].join('/')}"
	end

	naive_public_url(model.image)
	#=> "https://somebucket.s3.amazonaws.com/path/to/image.jpg"

Original AWS SDK public_url		100%
optimized AWS SDK public_url	Avoid the URI.parse, use ERB::Util.url_encode. Should be functionally identical, same output, I think!	60%
naive implementation	No escaping of S3 key for URL at all	7.5%
naive + ERB::Util.url_encode	should be functionally identical escaping to original implementation, ie over-escaping	28%
naive + URI.escape	we think is sufficient escaping, can be done much faster	15%
naive + EscapeUtils.escape_uri	we think is identical to URI.escape but faster C implementation	11%

	AWS_SIG4_SIGNER = Aws::Sigv4::Signer.new(
	service: 's3',
	region: AWS_CLIENT.config.region,
	credentials_provider: SOME_AWS_CLIENT.config.credentials,
	unsigned_headers: Aws::S3::Presigner::BLACKLISTED_HEADERS,
	uri_escape_path: false
	)

	def naive_with_uri_escape_escaping(shrine_file)
	# because URI.escape does NOT escape `/`, we don't need to split it,
	# which is what actually saves us the time.
	path = URI.escape(shrine_file.id)

	"https://#{["#{shrine_file.storage.bucket.name}.s3.amazonaws.com", *shrine_file.storage.prefix, shrine_file.id].join('/')}"
	end


	# not yet handling custom query params eg for content-disposition
	def direct_aws_sig4_signer(url)
	AWS_SIG4_SIGNER.presign_url(
	http_method: "GET",
	url: url,
	headers: {},
	body_digest: 'UNSIGNED-PAYLOAD',
	expires_in: 900, # seconds
	time: nil
	).to_s
	end

	direct_aws_sig4_signer( naive_with_uri_escape_escaping( shrine_uploaded_file ) )
	# => presigned S3 url

	user system total real
	sdk public_url 0.054114 0.000335 0.054449 ( 0.054802)
	naive S3 public url 0.004575 0.000009 0.004584 ( 0.004582)
	naive S3 public url with URI.escape 0.009892 0.000090 0.009982 ( 0.011209)
	sdk presigned_url 0.756642 0.005855 0.762497 ( 0.789622)
	re-use instantiated SDK Presigner 0.817595 0.005955 0.823550 ( 0.859270)
	use inline instantiated Aws::Sigv4::Signer directly for presigned url (with escaping) 0.216338 0.001941 0.218279 ( 0.226991)
	Re-use Aws::Sigv4::Signer for presigned url (with escaping) 0.185855 0.001124 0.186979 ( 0.188798)
	Re-use Aws::Sigv4::Signer for presigned url (without escaping) 0.178457 0.001049 0.179506 ( 0.180920)

Speeding up S3 URL generation in ruby

My app

On Benchmarking

Public S3 URLs

URI Escaping, the pit of confusing alternatives

Presigned S3 URLs

Look at what the SDK is doing, re-implement a quicker path

Yes, it’s much faster!

The Numbers

So what to do?

Published by jrochkind

10 thoughts on “Speeding up S3 URL generation in ruby”

Leave a comment

	original AWS SDK public_url implementation 0.053043 0.000275 0.053318 ( 0.053782)
	naive implementation 0.004730 0.000016 0.004746 ( 0.004760)