Open source solutions for data processing and analysis

Now on the market there are many open source solutions for processing and analyzing data that allow you to perform complex calculations, solve a wide variety of tasks from compiling operational reports to processing large amounts of data. Basically, their functionality is similar, but you can also choose a solution to perform certain functions.

For example, Jupyter is well-used for experimental calculations and interactive computing, Apache Airflow is suitable for planning and monitoring static cyclic data processing.

A platform like Knime allows you to perform a complete data analysis cycle.
But Knime does not provide sufficient development flexibility, it is too rigid platform and it is difficult to integrate into the infrastructure.

What we lacked?

Our company 54origins carries out a large number of researches in the field of data analysis for different companies in various spheres. And at some point, all these projects after 1-2 years of development began to turn into chaos and it was very difficult to get operation of the project in working mode, especially for systems with critical results (for example, exchange robots). In addition, we lacked custom interfaces, logic separation levels, the ability to scale computing power and system flexibility. To organize stable work and solve standard problems, we developed a specialized framework for us and our customers, which we called Computations54.

Computations54 is a framework based on the Python programming language that allows you to process large amounts of data, perform various complex calculations, collect them into reports and based on this make accurate forecasts with constant monitoring of the integrity of the results.

We made three levels of logic in Computations54:

  • shared core
  • sub core for specific tasks
  • core for calculations

The framework provides the reliability needed for critical computing, such as stock market calculations and medical data analysis. You can go directly from experimental activities to practice based on your calculations. Computations54 is used for responsible calculations where confidence in the reliability of the results is needed.

Since the framework is based on the Python programming language, it allows you to use all the advantages of Python and all the libraries available in Python for machine learning and data processing, such as:

  • Pandas
  • Open CV
  • Tesseract
  • Keras
  • NLTK and many others.

Jupyter is cool! But what can't Jupyter do?

Jupyter is an absolutely awesome tool for draft calculations and experiments. But Jupyter cannot be used in real systems, especially in such critical things as exchange calculations and medical analytics, due to the fact that an error in the system configuration can lead to disastrous results and its reliability leaves much to be desired.

Jupyter is suitable for experimental activities, and Computations54 allows you to run stable calculations and be sure of the result. Computations54 is a framework that allows you to output experiments that were performed in Jupyter to production.

Key features and advantages of Computations54

computation54 text mining

The main function of Computations54 is the intelligent processing and analysis of large amounts of data, followed by the identification of models and trends that will allow you to make the right management decisions.

Computations54 processes various types of data: text, numeric, and others. By converting data into a structured view, the service will make any information available for analysis. You only need to upload data to Computations54 and It will start processing automatically: structuring, sorting, analyzing, and making a forecast. Computations54 supports different calculation methods and different approaches to performing analysis.

A lot of attention was paid to the convenience of development, that is, it is possible to discuss Computations, issue tasks, and save results.

performing calculations

Advantages of Computations54:

  1. High potential for customization and scalability. You can configure calculation methods for specific tasks, analyze and structure huge amounts of text or numeric data, and output the processed information in the selected format.
  2. Multiprocessing system - that is, if 50,000 files are processed for 40 hours, with multiprocessing, these files will be processed in parallel in 5 threads, which will significantly save time resources.
  3. Internal monitoring that allows you to analyze the reliability of results.
  4. Multiple levels of logic.
  5. Adaptation for different types of calculations (GPU, Intel MKL).
  6. Cluster computing with an extensible cluster.
  7. Extreme flexibility in programming.
  8. Data integrity control.
  9. Ability to create control calculations that the system will focus on.
  10. The ability to clone Computations and create various solutions based on them.
  11. Built-in function with the possibility of GPU calculations.
  12. Connect a lot of databases from different sources.
  13. The ability to interact Computations with each other via the API and build cascades.
  14. Creating governing computations for building complex computing systems and separating teams.

monitoring calculations

Using Computations54 minimizes the impact of the human factor on company processes, therefore, the likelihood of errors is significantly reduced.

Alternative to Apache Airflow

Computations54 is an alternative to Apache Airflow and can be used to process any data within the organization. With Computations54, you can design, plan, and monitor complex workflows much more dynamically and faster than with AirFlow.

An extremely important function is the ability to run graph data analysis and create workflows in the form of directed acyclic graphs (DAG) of tasks as in Apache Airflow.

Directed acyclic graph

Computations54 at this stage supports only the Python programming language, but the speed of task execution is higher than in AirFlow due to the optimized calculation controller. Therefore, our framework is suitable for very fast, variable, including metaprogramming processes, while Apache Airflow is suitable for much more static tasks.

Conclusions and scope of Computations54

Big data mining can be used in various fields, so Computations54 has a very broad scope of application. This service can be used in any activity where it is necessary to analyze data, make calculations of any complexity and make forecasts based on this.

Computations54 is used in investment funds that trade on the stock exchange. Computations54 was also used by The University of Chicago Booth School of Business to analyze and identify problems with banks based on open reporting.

A large amount of data is stored and generated in medical institutions with important patient information in a structured and unstructured form. And with the help of the intellectual processing of these data arrays, it is possible to increase the efficiency of medical care by automating the solution of various problems (predicting the development of diseases, making a diagnosis, prescribing treatment, etc.). Computations54 can also be used in economic research, business, banking, real estate valuation.

Using Computations54 gives you complete confidence in the correctness of the final result of your calculations. Our tool is not only intended for conducting scientific experiments, it has also proved its effectiveness in practical applications.