What is EMRFS and what value it adds to the S3 file system?

6 min readApr 4, 2020

EMRFS is not a separate file-system, it’s an implementation of HDFS that adds strong consistency to S3, whereas S3 without EMRFS implementation provides read after write consistency for PUTS of new objects and eventual consistency for other operations on objects in all regions.

Let’s break the definition and understand it part by part.

There are many consistency models supported by the distributed systems, whereas each defines a set of rules to be followed when a sequence of operations are executed across multiple nodes. Strong and Eventual Consistency are one among these models.

Eventual Consistency vs Strong Consistency

Eventual Consistency:

It’s used in distributed computing to achieve low latency. When a write/update/delete request is submitted to an eventually consistent system, the request is marked as completed even if the process is completed in only one of the replicas of the data. For eg, Consider a new file is been written to an eventually system, which is configured to have 3 as the replication factor. The request will receive success response upon writing only one copy of the data itself. Within few seconds, subsequently the other two copies will be made as per the configuration. Any read request submitted immediately following the success response of the written request has the possibility of retrieving either the stale data (retrieved from the on-going other two copies) or the fully written copy of the file (the first copy of the file). This approach boosts performance with high availability but at the risk of retrieving irrelevant data for a short span.

Strong Consistency:

It’s used in distributed computing for use cases where data consistency is more important than high availability. When any request is submitted to a strongly consistent system, the request is marked as complete only when it is processed in all of the replicas. Strongly consistent system results in increased latency but provides consistent view of data upon retrieval at any time. Strong Consistency includes read-after-write consistency, read-after-update consistency and read-after-delete consistency.

i) read-after-write Consistency: After a new write request is submitted, the request is marked as complete only after the write is completed in all the three replicas.

ii)read-after-delete Consistency: Same as read-after-write Consistency but here the logic is applied for delete operation.

iii)read-after-update Consistency: It allows any edits applied to a subject be reflected in all subsequent requests.

As per AWS document, S3 supports the following consistency model,

Amazon S3 provides read-after-write consistency for PUTS of new objects in your S3 bucket in all regions with one caveat. The caveat is that if you make a HEAD or GET request to the key name (to find if the object exists) before creating the object, Amazon S3 provides eventual consistency for read-after-write.
Amazon S3 offers eventual consistency for overwrite PUTS and DELETES in all Regions.

The above statement implies that S3 supports eventual consistency for all operations except one scenario, that is when you create a new object and try to read that object after creation, it provides strong consistency ie, read-after-write consistency. It means after successful creation of new objects, trying to access them from any of the clients always results in consistent data. There is a special case in new object creation that follows eventual consistency.

The exceptional scenario is,

Consider you accessing an object before creation, and end up in object not found exception , subsequently you create that object in the S3 bucket. This sequence of operations follows eventual consistency and not strong consistency ie, read-after-write consistency. This scenario may either result in 404 object Not Found or in the written object retrieval which is an expected reaction from an eventually consistent system. This same eventual consistency behaviour is followed for UPDATES and DELETES as well in S3.

Please find the screenshots that demonstrate the above explained two scenarios of object creation a s below,

Scenario 1: Uploading a new object to S3 and retrieving it immediately.

S3 shows strongly consistent behaviour.

Scenario 2: Try accessing before creation, then create the object.

After few seconds, when you try accessing the object, it becomes available.

S3 shows eventually consistent behavior here.

When these types of eventual consistency cause problems?

There are jobs where steps read input data as the output files written by previous steps. In such cases, inconsistent view of output files results in erroneous results. To avoid this, while creating an EMR cluster it can be made EMRFS enabled. Enabling EMRFS in EMR clusters makes S3 strongly consistent.

What EMRFS does is it creates a dynamoDB table to track objects in S3. Whenever a new write request is submitted, EMRFS adds the object metadata to the dynamoDB table. When a read/list request is submitted on the tracked object/bucket, EMRFS cross verifies the retrieved data against the dynamoDB table, it waits and retries to receive all the data as specified in the DynamoDB table. For eg, When you write an object to S3 bucket, EMRFS adds the object’s metadata to the dynamoDB table. It marks the number of partition files created as part of that write request. When you try listing the objects in that bucket, S3 being eventually consistent not immediately lists all the partition files. In such case, EMRFS cross verifies the S3 retrieved partition files against the entry made in DynamoDB table. It retries until all the partition files are retrieved as specified in the DynamoDB table or until the retry limit expires.

Please find the demo screenshots of how to enable EMRFS on EMR clusters and how it maintains a metadata table in DynamoDB.

Step 1: Creating an EMR cluster with EMRFS enabled,