Building a stable and scalable system that communicates with various external API providers and keeps large-scale data current is an exciting programming challenge. Let’s go through some of the most important factors to consider when developing code that handles multiple API connections.
EmployeeReferrals is a global leader and the most trusted online talent referral platform. Synchronizing dozens of thousands of job offers is a difficult task, especially when there are various data sources. Our task was to design a stable and extendable system that communicated with multiple external API providers to handle the workload appropriately.
We’ve been working with EmployeeReferrals for a few years, and our primary mission has been to enable over one million users to refer their friends to job offers and ensure those offers have always been up to date. In this article, I will show you what it takes to communicate with multiple API providers in one application effectively. There are many factors we should consider when developing code that handles API connections. I will go through the ones, that resulted most important from our perspective.
Always look for maintainability and extendability
When we’re working on an application that heavily relies on multiple different API providers, we must have a design that allows adding new integrations quickly and keeps the code in a good shape.
Don’t repeat yourself
We always look for the same patterns in all API integrations that we handle. Although data comes from different sources and in various ways, we make sure that in the end it’s transformed into the same format and lands in our database.
In most cases, API integrations consist of the following elements:
- Authentication and authorization – it could be either simple API key communication, OAuth 2 protocol, or username-password protection; whenever possible, take your time to build bulletproof code that initializes the connection, so that you can easily reuse it in the future. In addition, it will speed up the development of new integrations and, more importantly, enhance data security during data synchronization.
- Parsing the data – there is only a handful of standard data formats served from API endpoints – CSV, XML, and JSON, to name a few. They haven’t changed for many years, so building standardized code to handle parsing is a good idea. With a tested solution, you won’t be surprised with encoding or missing-data issues along the way.
- Updating data within a database – we pull the data from APIs to parse it and then save it in our database, so we can serve it to our users. The database schema is the same for all integrations. Hence, we always recommend having a single point for the database transaction. That way it’s easier to maintain and more stable.
Remember that you’ll be constantly learning new things by implementing the next integrations. Thanks to a good code design, you can immediately improve the code for all integrations based on learnings from a new implementation.
With multiple data sources, you can’t be 100% sure that the way of handling the process won’t change in the future. Therefore, it’s best to keep a flexible code design, for example, with an adapter or builder design pattern, in order to avoid problems when the communication changes.
Commonly, technical details of an API integration for the same provider are different for various versions of the API. This needs to be also reflected in the file structure. To handle such cases, we stick to the following rules:
- Choose the correct namespace – the top namespace is always the integration name, so it can be easily located no matter what version of the integration we are looking for.
- Share the common elements for each version – in most cases, the parsing phase is the same, so we reuse the logic among many versions. Thanks to the design patterns, we can easily plug new parsing processes without reducing code readability.
- Remove outdated versions – if none of the companies is using a version of the API integration, we remove the code with one commit providing a meaningful name, so that we can always go back to it in the future if needed.
It’s reassuring, that with time, mainly authentication processes change. It happens because API providers improve the security of their solutions and switch to better and newer methods of protecting their clients’ data.
Do the right testing
Sometimes it’s not possible to build integration tests with the test versions of API environments, as not every provider gives such an opportunity. Still, it’s always necessary to write unit tests that verify the connection logic and can inform about any issues before our users do.
When testing API connections, we stick to the following rules:
- Use as much real data as possible – do not stub and use prerecorded API responses and files with the structure that matches the real files structure.
- Keep your tests up to date – record API responses and update test files to ensure that your code handles the recent integration version correctly.
- Try to test all possible scenarios – having a happy path tested is better than not having tests at all, but it is always better to handle all cases and be aware of all kinds of errors that appear.
Proper testing is also helpful during the software upgrade process. With time, the libraries that handle the API connection can change their inner implementation regarding params or headers parsing and passing.
Take your time to discover the API well
Based on our experience, reading documentation is insufficient to build a stable API connection. To design the API code in the best possible way, we should try a little harder.
Pay the same amount of attention to features but limitations of the API
Knowing what we can do with a particular API is as important as knowing what we can’t. The most common API limitations include:
- API calls limit – some providers limit their API and give 10k calls per 24 hours, for example. You need to take this into account if you perform a lot of calls. It’s easy to forget it later when adding the next features. To avoid that, build a method that returns the number of available calls per period and has basic monitoring.
- Rate limiting – performance is super important, but sometimes we just can’t add more background workers to call more endpoints simultaneously. Therefore, rate limiting is another factor we should consider when writing code. For example, instead of breaking the sync, we should wait for a while to be able to retry the call.
- Timeouts - you may experience timeouts when you request too many records at once. In many cases, it’s hard to specify exactly when to expect such a situation, so you have to test to see where the borderline is.
With the above limitations in mind, you can design the API communication that isn’t only rich in features but also reliable.
Keep up with the changes in API
The API's structure and the way it works can change over time. Therefore, you should be aware of those changes fast enough to reflect them in your code. Usually, the provider notifies everyone who pays for the API access, but it’s definitely worth subscribing to their newsletter (if possible) and often checking the blog or other communication at the same time.
Don’t take documentation for granted
It often happens that some of the features of the API are not documented well or not documented at all. In the past, we had many situations where the documentation was outdated, which was slowing down the development. Trust your own experience, and don’t hesitate to contact support if you will spot an endpoint that is not working the way the documentation describes it.
Another good practice is to check the issues tracker if it’s publicly available. Other users may experience some problems sooner than you, and they might have already found a solution that is not included in the official documentation.
Monitor the process
Once your code is ready to use the API and pull or push the data, it’s just the beginning of a journey. You need to know what is happening during the sync, expect errors, and act appropriately when they appear. A poorly designed API integration can cause frustration and increased error-tracking software costs.
Handle errors properly
Not all errors need to be fixed or investigated. Some of them will disappear after you retry the API call. However, when building the API code, we always stick to the following rules:
- Don’t catch generic errors – if you cannot tell precisely what error the code can throw, don’t catch it. Catching generic errors tends to hide the real issue and make the debugging process difficult.
- Let the errors be raised – if you want to retry the background worker, don’t silently catch the error. Instead, allow the error to be raised, log its occurrence in logs, and retry the worker, but don’t log the error in an error tracking software.
- Know when an error should be reported – always set the exact number of possible retries, and if retries don’t help, log the error in the tracking software.
Errors are integral to every API, and you just need to treat them as another way of passing information back to your system.
Log the progress
When many API calls are part of a complete data sync, it’s always wise to implement a simple monitoring code that will log the start date, end date, errors count, and other helpful information. The type and amount of information depend on your needs, so there is no golden rule here.
The progress information is beneficial when checking how the process is going at the very moment and when debugging or looking for performance issues.
Pay attention to the performance from day one
We always think about performance, even when we write relatively simple code to handle API communication. Such a mindset is a must when working with the data; the amount can change anytime. Follow these rules if you don’t want to hurt your application’s performance with the API calls.
Delegate calls to the background
A single background job is a perfect application unit for calling the API. However, if you need to be able to spot the moment when all API-related calls are done, you can use the batches feature and callbacks.
When API integration happens behind the scenes, you have more control over it, and the end-user only knows the end result. Moreover, with such an approach, you can also increase or decrease the number of workers in real-time, depending on the needs.
An example from our team – when we were working on a trading platform, we decided to dynamically assign the number of workers per process to finish the most time-consuming tasks first and then delegate finished workers to perform less complicated tasks.
Make retries your friends
Retries used with caution can improve the performance of your API integration. Knowing when to retry the call and how many times is essential. If you decide to use the retry mechanism, try to isolate the API part as much as possible to avoid data duplication when retrying the call.
Timeouts are the best candidates for retries, but it may be worth checking other errors as well. It all depends on the context. First, collect all information about the API, then predict all possible scenarios, and design the system to handle all of them. Ideally, you wouldn't need to investigate a problem unless it’s critical.
Last but not least: take security seriously
Don’t store sensitive information like passwords, private keys, or API keys as plain text in the database or elsewhere. The API's performance, stability, and extendability are as crucial as the security of the credentials.
When you plan to build an interface where clients can save the credentials for API by themselves, make sure that your application is served via HTTPS protocol and protected from CSRF, XSS, or SQL injection attacks.
Unlock the power of APIs for your product
Using APIs, we can connect different systems in a fairly easy, quick, and cost-effective way. Hence, our applications become more powerful and competitive. If you need help in setting up a system to manage multiple third-party APIs, let’s see how we can work together!