Needs advice
on
Amazon EFSAmazon EFS
and
Amazon S3Amazon S3

We are creating a Document Management micro service on Google Cloud Platform and we need to choose the best Document Storage option considering all its Pros and Cons.

READ LESS
2 upvotes·1.9K views
Replies (2)

Since you are already on GCP, why not use their storage service for that? Another option would be to build your cluster with GCP computing instances and MinIO (https://min.io/) or to go with Backblaze (https://www.backblaze.com/). In both cases, you have an S3-compatible API; just with MinIO, you have more things available, like custom policies, hooks, and various integrations. And it is fast. Like very fast.

READ MORE
5 upvotes·137 views
DevOps | Senior Developer ·

Really depends on your use case, requirements for access, and level of development effort available for the implementation. EFS is going to be the more turnkey solution with your typical tree based folder/file system within logical volumes. In all my years though, I've never implemented it. Always S3. So rather than to speak about what I don't really know, I'll tell you what I know about S3.

S3 is object based storage. Meaning everything is essentially flat, but you can simulate the usual structure you are used to with virtual folders. It matters not what the contents are and nothing requires even a file extension. Policy contol of files and folders is extremely granular...which is a big plus. Everything should be private and then you manage roles and permissions for access control...or you can use IAM. You can obfuscate everything in your DB in order to put a file name and extention back to an object when you want to serve it up to a user. This further protects your objects even if someone gained bucket access. You can and should make "signed request" links to all files...meaning only for a small window is a file accessible to an intended users. S3 is also infinite. Meaning there is no limit to how much you can persist. While EFS is full scalable in both performance and size it also comes at a cost of elastic compute and storage. In S3, there are also lots of different storage classes meaning standard (hot - accessed frequently) all the way to Glacier (ice cold - archived, basically not accessed). You can also replicate objects to other regions for DR and redudency easily. EFS can be linked using VPC peering as well. S3 also has many useful features like lifecycle management, versioning, and many more. Virus scanning using tags and a Lambda is also an implementation to keep your files safe from harm. S3 also has advanced querying capabilities. Multipart uploads are an essential feature for massive objects (which also basically have no limit). You can chunck the file and send it in pieces for either speed or manageability. S3 can also be used with Glue and Athena to take raw data, assemble it, and then query it with common SQL language. You can also do a similar thing to perform analytics in a AWS Lake.

This is I'm sure not comprehensive, but hopefully helps. S3 will take work to build around, but will give you every opportunity you will ever want. EFS could be a drop in solution that is fast, friendly, and familiar but more costly and with less control.

READ MORE
4 upvotes·40 views
Avatar of gobinathbk