Notes on study of shrine implementation

Developing software that is both simple and very flexible/composable is hard, especially in shared dependencies. Flexiblity and composability often lead to very abstract, hard to understand architecture. An architecture custom-fitted for particular use cases/domains has an easier time of remaining simple with few moving parts. I think this is a fundamental tension in software architecture.

shrine is a “File Attachment toolkit for Ruby applications”, developed with explicit goals of being more flexible than some of what came before. True to form, it’s internal architecture can be a bit confusing.

I want to work with shrine, and develop some new functionality based on it, related to versions/derivatives (hopefully for submission to shrine core), requiring some ‘under the hood’ work. When I want to understand some new complicated architecture (say, some part of Rails), one thing I do is trace through it with a debugger (while going back and forth with documentation and code-reading), and write down notes with a sort of “deep dive” tour through a particular code path. So that’s what I’ve done here, with shrine 2.12.0. It may or may not be useful to anyone else, part of the use for me is in writing it; but when I’ve done this before for other software others have found it useful, so I’ll publish it in case it is (and so I can keep finding it again later to refer to it myself, which I plan to do).

Some architectural overview

shrine uses a plugin system based on module mix-in overrides (basically, inheritance),  which is not my favorite form of extension (many others would agree). Most built-in shrine func is implemented as plugins, to support flexible configuration. This mixin-overridden-methods architecture can lead to some pretty tightly coupled and inter-dependent code, even in ostensibly independent plugins, and I think it has sometimes here.  Still, shrine has succeeded in already being more flexible than anything that’s come before (definitely including ActiveStorage). This is just part of the challenge of this kind of software development, I don’t think anyone else starting over is gonna get to a better overall place, I still think shrine is the best thing to work with at present if you need maximal flexibility in handling your uploaded assets.

Shrine has a design document that explains the different objects involved. I still found it hard to internalize a mental model, even with this document. After playing with shrine for a while, here’s my own current re-stating of some of the primary objects involved in shrine (hopefully my re-statement doesn’t have too many errors!).

An uploader (also called a “shrine” object, as the base class is just Shrine) is a  stateless object that knows how to take an IO stream and persist to some back-end.   You generally write a custom uploader class for your app, because a specific uploader is what has specifics about any validationtransformationmetadata extraction, etc, in ingesting a file. An uploader is totally  stateless though (or rather immutable, it may have some config state set on initialize) — it’s sort of a pipeline for going from an IO object to a persisted file.  When you write a custom uploader, it isn’t hard-coded to a particular persistent back-end, rather a specific storage object is injected into an individual uploader instance at runtime.

A shrine attacher is the object that has state for the file. An attacher knows about the model object the file is attached to (a specific attacher instance is associated with a specific model instance).  An attacher has two uploaders injected into it — one for the temporary cache storage and one for the permanent store storage. These are expected to be the same class of uploader, just with different storages injected.  An attacher has ORM plugins that handle actual persistance to the db, as well as tracking changes, and just everything that needs to be done regarding the state of a particular file attachment.

In a typical model, you can get access to the attacher instance for an asset called avatar from a method called avatar_attacher. The avatar method itself is essentially delegated through the attacher too. The attacher is the thing managing access and mutation of the attached files for the model.  If you ask for avatar_attacher.store or avatar_attacher.cache, you get back an uploader object corresponding to that form of storage — to be used to process and persist files to either of those storages.

How do those methods avatar and avatar_attacher wind up in the model?  A ruby module is mixed in to the model with those methods. Shrine calls this mix-in module an “attachment”. When you do include MyUploader::Attachment.new(:name_of_column) in your model, that’s returning an attachment module and mixing it into your model.  I find “attachment” not the most clear name for this, especially since shrine documentation also calls an individual file/bytestream an “attachment” sometimes, but there it is.

And finally, there’s the simple UploadedFile, which is simply a model object representing an uploaded file! It can let you get various information about the uploaded file, or access it (via stream, downloaded file, or url).  An UploadedFile is more or less immutable. It’s what you get returned to you from the (eg) avatar method itself.  An UploadedFile can be round-trip serialized to json — the json that is persisted in your model _data column. So an UploadedFile is basically the deserialized model representation of what’s in your _data column.

It’s important to remember that shrine uses a two-step file persistence approach. There is a temporary cache storage location that has files that may not pass validation and may not yet have been actually saved to a model (or may never be).  The file can be re-displayed to a user in a validation error when it’s in “cache” for instance. Then when the file is actually succesfully permanently persisted attached to a model, it’s in a different storage location, called the store.

Tracing what happens internally when you attach a file to an ActiveRecord model using shrine

Most of this will be relevant regardless of ActiveRecord, but I focused on an ActiveRecord implementation. My demonstration app used to step through uses a bog-standard Shrine uploader, with no plugins (but :activerecord).

class StandardUploader < Shrine
  plugin :activerecord
end

Just to keep things consistent, we attach to a model on the “standard_data” column, with accessor called “standard”.

  include StandardUploader::Attachment.new(:standard)

What is shrine doing under the hood, what are the different parts, when we assign a new file to the model?  We’ll first do model.standard = File.open("something"), and then model.save.

First model.standard = File.open("something")

The #standard= is provided by the attachment module mix-in, and it calls  asset_attacher.assign(io_object)

If it’s NOT a string, assign first does: `uploaded_file = [attacher.]cache!(value, action: :cache)` (What’s up with ‘not a string’? A string is assumed to be serialized json from a form representing an already existing file. The assign method assumes it’s either an IO object or serialized JSON from a form; there are other methods than `assign` to directly set an UploadedFile or what have you).

The cache! method calls uploaded_file = cache.upload(io)cache points to an instance of our StandardUploader configured to point at the configured ‘cache’ (temporary) storage, so we’re calling upload on an uploader.

[cache uploader]#upload calls processed to run the IO through any uploader-specific processing that is active on the “cache” stage.

Then it calls #store on itself, the uploader assigned as `cache`. “Uploads the file and returns an instance of Shrine::UploadedFile. By default the location of the file is automatically generated by #generate_location, but you can pass in `:location` to upload to a specific location. [ie path, the actual container storage location is fixed though]”  The implementation is via an indirection through #_store, which:

1.  calls get_metadata on itself (an uploader), which for a new IO object calls extract_metadata, which is overridden by custom metadata plugins. So metadata is normally assigned at the cache/assignment phase. This is perhaps so the metadata can be used in validation?  Not sure if there’s a way to make metadata be in the background, and/or be as part of the promotion step (when copying cache to store on save) instead. There’s some examples suggesting they are relevant here, but I don’t really understand them.

2. Calls #put on itself, the uploader. put by default does nothing but call #copy on the uploader, which actually calls #upload on the actual storage object itself (say a Shrine::Storage::FileSystem), to send the file to that storage adapter — in this case for the configured cache storage, since we started from cache on the attacher. (Some plugins may override put to do more than just call copy). 

3. Converts into a shrine UploadedFile object representing the persisted file, and returns that.

So at this point, after calling attacher.cache!, your file has been persisted to the temporary “cache” storage. attacher.cache! purely deals with the stateless uploader and persisting the file; next is making sure that is recorded in your model _data attribute.

[attacher].assign then does ‘[attacher.]set(uploaded_file)’, where uploaded_file is what was returned from the previous cache! call. set first stores the existing value (which could be nil or an an UploadedFile) in the attacher instance variable @old, (in part so it can be deleted from storage on model persistence, since it’s been replaced).  And then calls _set to convert the UploadedFile to a hash, and write it to the _data model attribute — so it’s there ready for persistence if/when the model is saved.

So after assignment (model.standard = File.open("whatever")), the file is persisted in your “cache” storage. The in-memory model has asset_data that points to it. But nothing about that is persisted to your model’s persistence/ORM.  If the model previously had a different file attached, it’s still there in the store storage.

Let’s see how persistence of the new file happens, by tracing the ActiveRecord ORM plugin specifically, when you call model.save.  First note the active_record plugin makes sure shrine’s validations get used by the model, so if they fail, ActiveRecord’s save is normally going to get a validation failure, and not go further. If we made it past there:

In an active_record before_save, it calls attacher.save if and only if the attacher is changed?, meaning has set the @old ivar of previous value (could be nil previous value, but the ivar is set). However, the default/core implementation of save doesn’t actually do anything — this seems mainly here as a place for shrine plugins to hook into actually “before_save”, in an ORM-agnostic way.  (Might have been less confusing to call it before_save, I dunno).  The file is not moved to the permanent storage (and the old file deleted from permanet storage) until after the model has been succesfully persisted.

Then ActiveRecord’s own save has happened — the file data representing the new file persisted in temporary cache has now been persisted to the database.

Then in an active_record after_commit, finalize is called on the attacher. finalize is only called if  @old  is set — so only if the attached file was changed, basically.

The [attacher.]finalize method itself immediately returns if there is no “@old” instance variable set. (So the check with changed? in the hook is actually redundant, even if you call finalize every time, it’ll just return. Unless plugins change this).

Then finalize calls [attacher.]replace. Which — if the @old instance variable is not nil (in which it’s an UploadedFile object), and the object was in the cache storage (it must be in store storage; checked simply by checking the storage_key in the data hash) deletes the old value. “replace” in this case actually means “delete old value” — it doesn’t do anything with the new value, whether the new value is in cache or store. (not to be confused with a different #replace method on UploadedFile, which actually only deals with uploading a new file. These are actually each two halves of what I’d think of as “replacement”, and perhaps would have best had entirely different names — especially cause they both sound similar to the different “swap” method). 

The finalize method removes the @old ivar from the attacher, so the attacher no longer thinks it has an un-persisted change. (would this maybe be safer AFTER the next step?)

finalize calls `_promote(action: :store) if cached?` — that is, if the current UploadedFile exists, and is associated with the cache store.   [attacher.]#_promote just immediately calls promote —  both of these methods can take an uploaded_file argument, but here they are not given one, and default to the current UploadedFile in this attacher, via get

[attacher.]promote does a `stored_file = store!(uploaded_file, **options)`.  Remember the `cache!` method above? `store!` is just the same, but on the uploader configured as `store` storage instead of `cache` storage — except this time we’re passing in an UploadedFile instead of some not-yet-imported io object. Metadata extraction isn’t performed a second time, because, get_metadata has special behavior for UploadedFile input, to just copy existing metadata instead of re-extracting it.

At this point, the file has been copied/moved to the ‘store’ storage — but another copy of the file may still exist in cache​ storage (in some cases where the cache and store storages are compatible, the file really was moved rather than copied though), and no state changes have been made at all to the model, either in-memory or persisted, to point to this new file in permanent storage.

So to deal with both those things, [attacher].promote calls [attacher.]swap, which is commented as “Calls #update, overriden in ORM plugins, and returns true if the attachment was successfully updated.” In fact, the over-ridden attacher.update in the activerecord plugin just calls super, and then saves the AR model with validate:false. (I am not a fan of the thing going around my validations, wonder what that’s about).

Default update(uploaded_file) just calls _set(uploaded_file).

_set pretty much just converts the UploadedFile to it’s serializable json, and then calls write.

write just sets the model attribute to the serializable data (it’s still not persisted, until it gets to the ORM-specific update, where as a last line the model with new data is persisted).

so I think attacher.swap actually just takes the UploadedFile, serializes it to the _data column in the model, and saves/persists the model. Not sure why this is called swap. I think it might be more clear as “update” — oops, but we already have an update, which is by default all that swap calls. I’m not sure the different intent between swap and update, when you should use one vs the other.  (This is maybe one place to intervene to try to use some kind of optimistic or pessimistic locking in some cases)

If swap returns a falsey value (meaning it failed), then promote will go and delete the file persisted to the store storage, to try and keep it from hanging around if  it wasn’t persisted to model.  I don’t totally understand in what cases swap will return a falsey value though. I guess the backgrounding plugin will make it return nil if it thinks the persisted data has changed in db (or the model has been deleted), so a promotion can’t be done.

overview cheatsheet

pseudo-code-ish chart of call stack of interesting methods, not real code

model.avatar=(io)   =>  avatar_attacher.assign(io)

↳ uploaded_file = avatar_attacher.cache!(io)

↳  avatar_attacher.cache.upload(io) => processes including extracting metadata and persists to storage, by calling avatar_attacher.cache.store(io)

↳ io = uploader.processed(io)

↳ io = uploader.store(io) => via uploader._store(io)

↳ get_metadata

↳ uploader.put(io) => actually file persists to storage

returns an UploadedFile

↳ avatar_attacher.set(uploaded_file)

↳ stores previous value in attacher ivar “@old”, puts serialized UploadedFile in-memory avatar_data attribute

model.save

an activerecord before_save triggers avatar_attacher.save iff attacher.changed? (has an @old ivar). Core attacher.save doesn’t do anything, but some plugins hook in.

activerecord does the save, and commit.

an active_record after_commit triggers avatar_attacher.finalize iff attacher.changed?

↳ attacher._promote/promote iff  attacher.changed?

↳ stored_file = avatar_attacher.store!( UploadedFile in-memory )

↳ see above at cache! — extra metadata, does other processing/transformation, persists file to store storage, updates in-memory UploadedFile and serialization.

 ↳ attacher.swap(newly persisted UploadedFile)

↳ attacher.update(newly persisted UploadedFile) => just calls _set(uploaded_file), which properly serializes it to in-memory data, and then in an activerecord plugin override, persists to db with activerecord.

Some notes

On method names/semantics

“Naming” things is often called (half-jokingly half-serious) one of the hardest problems in computer science, and that is truer the more abstract you get. Sometimes there just aren’t enough English words to go around, or words that correctly convey the meaning. In this architecture, I think both the replace methods probably should have been named something else to avoid confusion, as neither one does what I’d think of as a “replace” operation.

In general, if one needs to interact with some of these methods directly (rather than just through the existing plugins), either to develop a new plugin or to call some behavior directly without a plugin being involved — it’s not always clear to me which method to use. When I should use swap vs update , which in the base implementation kind of do the same thing, but which different plugins may change in different ways? I don’t understand the intended semantics, and the names aren’t helping me. (promote is similar, but with an UploadedFile which hasn’t yet been processed/persisted? Swap/update takes an UploadedFile which has already been persisted, for updating in model).

It is worth noting that all of these will both change the referenced attached file on a model and persist the whole model to the db. If you just want to set a new attached file in the in-memory model without persisting, you’d use “attacher.set(uploaded_file)” — which requires an UploadedFile object, not just an IO. Also if you call set multiple times without saving, only the penultimate one is in the @old variable — I’m not sure if that can lead to some persisted files not being properly deleted and being orphaned?

Shrine plugins do their thing by overriding methods in the core shrine — often the methods outlined above. Some particularly central/complicated plugins to look at are backgrounding and versions (although we’re hoping to change/replace “versions”) — they are very few lines of code, but so abstract I found it hard to wrap my head around.  I found that the understanding of what unadorned base shrine does above was necessary  to truly understand what these plugins were doing.

Are there ways to orphan attached files in shrine?  That is, a file still stored in a storage somewhere, but no longer referenced in a model?  For starters the “cache” storage is kind of designed to have orphaned files, and needs to have old files cleaned out periodically, like a “tmp” directory. While there is a plugin designed to try to clean up some files in “cache”, they can’t possibly catch everything — like a file in “cache” that was associated with a model that was never saved at all (perhaps cause of validation error) — so I personally wouldn’t bother with it, just assume you need to sweep cache, like the docs suggest you do.

Are there other ways for files to end up orphaned in shrine, including in the “store” storage? If an exception is raised at just the wrong time?  I’m not sure, but I’d like to investigate more. An orphaned file is gonna be really hard to discover and ever delete, I think.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s