Magento 2 Bulk API & Asynchronous Import

- E-Commerce, Fire development, Magento 2

Magento 2 Bulk API & Asynchronous Import

Although Magento 2 got tons of significant improvements in comparison to its first version, its new API is one of the most notable enhancements. Not only developers can leverage its benefits but also other users, including merchants, marketers, and backend administrators. With the new Swagger documentation, we got the REST interface that is even more practical and user-friendly whenever before.

The new Magento 2 Bulk API now covers numerous areas, including the leading data transfer flows. Thus, you can leverage the technology to create and update products via asynchronous import. Both simple and complex items support automation. At the same time, the platform provides the ability to add new information for categories and customers. The Magento 2 Bulk API even lets you control the entire checkout process.

'

Preconditions

Back in 2018, Magento did a huge step forward with its APIs. However, there was a bottleneck that led to severe performance and scalability issues. The system didn’t have an optimized Bulk API for catalog import. As a result, connecting to ERP systems was problematic. The more products the website had and the more frequent were the updates, the more complexity it’s owners faced.

Creating a scalable Bulk API became the number one challenge. The new mechanism should be powerful enough to handle the platform’s numerous calculations and database operations related to such features as tier and group pricing, multiple stores and currencies, catalog and cart price rules, etc. And don’t forget about an observer and plugin system that forced developers to rely upon default events while adding their custom logic.

A community-driven project was started on Github to solve all these and other issues. Its core goals were:

  • API that supports multiple products that are persisted a time;
  • Products coming into Bulk API are persisted without deadlocks;
  • The Bulk API persistence is optimized for performance and remains backward-compatible with current customizations;
  • A client can send multiple Bulk API requests in parallel;
  • Magento can process multiple Bulk API workers in parallel;
  • Support for multiple entities;
  • Quick HTTP requests;
  •  Asynchronous Bulk data processing.

The History of The Asynchronous Magento 2 Bulk API

For almost a decade, the Magento platform lacked a native way to connect a webshop to external ERP systems. Everyone thought that the launch of Magento 2 would solve the problem due to its renewed API coverage. However, several projects revealed the unreadiness of the system to support the challenging needs of its users. Even though the new API unveiled a significant superiority over its previous versions, some problems related to the ERP integration occurred. 

Some external platforms cannot send requests to Magento one-by-one, providing the e-commerce platform with all objects simultaneously. The system has to receive thousands of requests via an API endpoint. As you might have guessed, nothing good happens in such situations.

Comwrap proposed to create a new layer between the requests to the Magento 2 REST API and a RabbitMQ queue. The middleware caught all the requests and forwarded them to the queue. On the next stage, it read the returning messages from RabbitMQ, executing them one by one. 

Another enhancement was to add a log of all actions. As a result, it was possible to backtrack all the errors and spend less time solving them.

However, Comwrap was not the only company that suffered from the API limitations of Magento. Another developer that faced problems like that was Balance Internet. Furthermore, the company proposed a similar solution. Both wanted to contribute to the implementation, making Magento 2 more powerful and user-friendly.

Let’s see what changes were implemented

New Endpoints

The project’s main goal was to change the way Magento perceive different calls. An Asynchronous API added REST URL routes providing the platform with the ability to understand what call is synchronous and what one is asynchronous. Besides, the existing Magento Web APIs should support asynchronous mode automatically. The first algorithm included the following steps:

  1. Asynchronous calls are sent to the message queue. 
  2. A consumer reads them from the queue
  3. The calls are executed one-by-one. 

However, it was necessary to find a solution to distinguish between the two call types. Comwrap developers proposed a new prefix. The idea was to separate asynchronous calls from synchronous ones with async, like in the example below:

As for other API requests, they remained unchanged. The same was about parameter logic and request body, which were transferred similarly to synchronous requests. The image below illustrates the complete workflow:

Magento 2 Bulk API & Asynchronous Import

Asynchronous Responses

Now, Magento creates a bulk operation for every asynchronous request. It is an entity that aggregates multiple operations. Besides, the entity tracks the aggregated status of requests. 

Thus, a bulk for a single asynchronous request consists of a sole operation. In its turn, an asynchronous request results in a generated Bulk UUID, which Magento returns. Developers leverage it to track the status of bulk operations.  

Bulk API

Now, let’s explore another use-case for Magento 2 asynchronous APIs. An external platform sends a large number of entities related to the same endpoint. With the old API, it would be necessary to invoke the endpoint multiple times. However, Comwrap developers proposed a more efficient algorithm. They offered a particular type of API endpoints – Bulk API. How did it change the game? 

  1. Bulk API combined multiple entities related to the same type into an array
  2. The array participated in a single API request. 
  3. Next, the endpoint handler split the array into individual entities. 
  4. After that, it sent them in separate messages to the Message Queue.

Since the core AsynchronousOperations Magento module was incorporated even before the contribution, implementing Bulk API was not challenging at all. However, it was necessary to achieve the correct handling of Web API requests. Developers proposed new routes: 

A list of operations statuses was added to the response: 

The new approach opened new possibilities for POST, PUT, or DELETE requests. From that point, developers were able to execute them like a bulk operation.

URL Parameters

It was also necessary to discover the best way to handle placeholders for parameters inside endpoint URLs. Since requests were converted, parameters like “sku” and “optionId” became not required as input. Also, note that for Bulk API requests, it was possible to generate the URL path by replacing the colon “:” with the prefix “by” like int he example below:

Status Endpoints 

Several calls were added to enable operation status tracking. They also introduced real operation responses so that it became possible to track progress or discover errors. Everything was based on the Bulk UUID that became the primary search parameter.

Now, you need to get the Short Status to find out the operation status. The request looks as follows:

When it comes to the Detailed Status, this request returns a response about operations status. At the same time, you receive detailed information about each operation. The corresponding request looks as follows:

Swagger

A Swagger schema was another crucial feature that enabled asynchronous operations as they are now. The Magento 2 system offered documentation for the asynchronous endpoints in the form of Swagger UI. As a result, it implemented the ability to outline input and output types.

Results

Of course, the Magento Message Queue and the Rabbit MQ integration framework were implemented in Magento Commerce first. At a particular stage, they were transferred to Magento Open Source. After three months of development, the project was completed with four pull requests, covering all the improvements described in this chapter. All the benefits became a part of Magento 2.3 and its next versions. However, it was not the end, since the bulk and asynchronous API still required many more things to do:

  • Redis framework for message statuses;
  • Strictly-type status messages;
  • Various message queue framework improvements, etc.

Check the backlog to see what is already implemented. Check the full article for further information: Comwrap Collaborates with Magento on New Asynchronous Bulk API.

Magento 2 Bulk API Performance

magento 2 bulk api asynchronous import

OF course, the implementation of the Magento 2 Bulk API not only enabled new possibilities related to the integration but also increased the platform’s performance. The elapsed time of the Asynchronous API is a bit less than the Sync API since the asynchronous approach allows saving time during item creation. However, each API request requires initializing the entire Magento instance. Besides, the difference between synchronous and asynchronous APIs depends on batch size. The time consumed by Async vs. Sync API is a bit less on big batches, but more significant on smaller ones.

As for the time of Bulk API, it is constant and almost independent from the number of items since he queueing of each item takes time for writing an operation status into the database and queuing a message in Rabbit MQ. As a result, the performance of Bulk API can be several times higher on big batches: 

  • 30% decrease in total time in comparison to Sync and 41.5%  – in comparison to Async for 100 items;
  • Up to 84% (Sync) and 80% (Async) for 1000 items.

However, on small batches that may contain 1 product, both Async and Bulk asynchronous methods show less efficiency than Sync. The latter consumes less resources. There are situations when a small batch sent via one of the Asynchronous methods spend some time in the RabbitMQ Queue before processing begins. 

The performance testing illustrates the following tendency: the bigger the number of items is, the higher the performance improvement you get with Bulk API. Thus, the implementation of Bulk API in Magento 2 was a revolutionary idea, that was entirely worth the time and effort spent. You can see the detailed results of all performance tests here: Asynchronous Bulk API Performance Test.

Magento 2 Bulk API Endpoints

Now, let’s see how the official documentation describes the new Magento 2 Bulk API. The new technology differs from other REST endpoints due to the ability to combine multiple calls of the same type into an array. This array is then executed as a single request (despite the number of calls). The system uses the endpoint handler to split it into individual entities. Next, it writes them as separate messages to the message queue.

As you can see, the initial idea was implemented without any changes. However, you must install and configure RabbitMQ before using the Bulk API. Otherwise, you won’t process messages more efficiently.  After the tool is installed, use the following command to start the consumer responsible for asynchronous and bulk API messages:

Routes

Another moment that didn’t change since the initial implementation of the Bulk API idea is the prefix. You need to add /async/Bulk before /V1 of a synchronous endpoint route to call a Bulk endpoint (POST /async/bulk/V1/products).

Endpoint routes with input parameters still require additional changes. You need to replace the colon (:) with by. Also, edit the first letter of the parameter, changing it to uppercase.

For instance, we have the following Synchronous route: 

To create a corresponding Bulk role, we need to add the prefix – async/bulk/ – and edit input parameters – :sku and :entryId. The new parameters will look as follows:

  • bySky
  • byEntryId

Thus, the complete bulk route gets the following appearance:

Payloads

It is also worth mentioning that a bulk request payload contains an array of request bodies.

Responses

When it comes to the response, it contains an array that shows whether the call added each request to the message queue successfully or not.

DELETE requests

And you can use the following call to deletes CMS blocks asynchronously:

Store scopes

It is possible to specify a store code in the route of an asynchronous endpoint. As a result, it will operate on a specific store instead of a default one. For instance:

If you want to perform operations on all existing stores, specify the all store code like shown below:

Creating/updating objects

Note that there are several rules to follow when you create or update objects using the Magento 2 Bulk API. If you’ve occasionally missed specifying a store code when creating a new product, Magento will create a new object with all values and set it globally. However, if it happens when you update a product, the values are updated for the default store only. To update values for all store scopes, use the “all” parameter. Use the “<store_code> parameter to update values for the defined store only.

For more code examples and other nuances, read this article: Magento 2 Bulk API Endpoints.

Asynchronous Import

Now, when we’ve briefly described what the Magento 2 Bulk API is and how to use it, we can proceed to the Magento 2 Asynchronous Import. The idea behind this project was to create Web API support for importing different types of data into Magento. First of all, developers wanted to replace the existing module, so they have to recreate its functionality considering asynchronous opportunities. Thus, the new extension covered all import features that were already represented on the platform.

The developers decided to create a module distributed within the Magento/AsynchronousImport* directories. Another goal was to follow technical guidelines and recommendations of the Service Isolations approach. Unique business logic should be prepared for each module. And, what is even more important from the perspective of our article, renewed import processes should be run asynchronously via the Magento 2 Bulk API following these steps:

  1. A user uploads an import data file (CSV or other formats);
  2. The Asynchronous Import extension receives and validates the file;
  3. Next, the module returns a File UUID to the user;
  4. The user applies custom parameters if applicable using the File UUID;
  5. After that, the module implements the following actions:
    1. parses the file, 
    2. splits it on single messages,
    3. sends the messages to the Asynchronous Bulk API of Magento 2;
  6. After the import is complete, it is possible to request the import status and resubmit failed objects (if there are any).

The development was split into two stages. During the first one, it was planned to implement endpoints to receive files, start processing, and receive statuses. The second phase was about building a UI for the Asynchronous Magento 2 import. Developers planned to create a separate Magento extension that would utilize Bulk API to communicate with the Asynchronous Magento 2 import module. You can find the project here: Async Import Wiki.

Were these goals implemented? The Asynchronous Import is already a part of the Magento 2 core, so the goals were achieved at least partly. There is still room for improvement; however, you can always leverage an alternative solution. Currently, we are working on the implementation of the Message Queuing on Rabbit MQ and support for Bulk API import in Improved Import & Export extension. The module already supports all the core Magento 2 entities, multiple file formats, API and Google Sheet data transfers, as well as robust manual or preset-based mapping capabilities. At the same time, even with no support for the Asynchronous Bulk API, the plugin shows high performance even for files with thousands of records. Follow the link below for further information:

Get Improved Import & Export Magento 2 Extension

And don’t miss our Complete Guide to Magento 2 Product Import / Export.

Creating Products via Bulk APIs

You can use the Magento 2 Bulk APIs not only for import/export purposes but also for creating multiple customers and products, updating prices, and assigning numerous products to a specific warehouse in Bulk. Due to the Bulk APIs, all these actions are considered a single call.

The official Magento 2 documentation includes a tutorial that explains how to leverage the Magento 2 Bulk APIs. The material teaches how to create a configurable product. This product type is chosen since a configurable product is a parent product of multiple simple products. It results in a situation when a buyer must make at least one choice (usually, there are more choices) to add a product to the cart. 

For example, shoes come in a variety of sizes. If you are offering a model of snickers in five sizes, then your configurable product consists of five simple products. If there are two colors of each size, then you have to deal with ten simple products (10 different combinations).

As for the tutorial, it describes how to create a t-shirt that comes in three sizes but one color. The following steps are described:

  1. The configurable product with basic characteristics;
  2. A simple product for each size;
  3. The connection between simple products and the configurable one;
  4. An option that allows specifying a custom text on the shirt.

You can find the tutorial here: Create a configurable product using Bulk APIs.

'