illustration for the COMPUTATIONS54 article
  1. OPEN SOURCE SOLUTIONS FOR DATA PROCESSING AND ANALYSIS
  2. WHAT WE MISSED?
  3. JUPYTER IS VERY GOOD! BUT WHAT CAN'T JUPYTER CAN?
  4. KEY FEATURES AND BENEFITS OF COMPUTATIONS54
  5. ALTERNATIVE TO APACHE AIRFLOW
  6. CONCLUSIONS AND AREAS OF APPLICATION

OPEN SOURCE SOLUTIONS FOR DATA PROCESSING AND ANALYSIS

Now on the market there are many open source solutions for data processing and analysis that allow you to perform complex calculations and solve a wide variety of problems from drawing up operational reports to processing large volumes of data. Basically, their functionality is similar, but you can also choose a solution to perform certain functions.

For example, Jupyter is good for experimental calculations and interactive calculations, Apache Airflowis suitable for scheduling and monitoring static cyclic data processing processes.

A platform like Knime allows you to perform a full cycle of data analysis. But Knime does not provide enough development flexibility, it is too rigid a platform, and it is difficult to integrate into the infrastructure.

WHAT WE WERE MISSING?

Our company 54origins carries out a large amount of research in the field of data analysis for different companies in various fields. And at some point, all these projects, after 1-2 years of development, began to turn into chaos, and it was very difficult to get the project functioning in working mode, especially for systems with the criticality of the results obtained (for example, stock exchange robots). In addition, we lacked custom interfaces, layers of logic separation, the ability to scale computing power, and system flexibility. To organize stable operation and solve standard problems, we have developed a specialized framework for us and our clients, which we called Computations54.

Computations54 is a framework in Python that allows you to process large amounts of data, carry out various complex calculations, collect them into reports and, based on this, make accurate forecasts with constant monitoring of the integrity of the results obtained.

We made three levels of logic in Computations54:

  • common core
  • subkernel for specific tasks
  • calculation kernel

The framework provides the reliability needed for mission-critical computing such as stock trading and medical data analysis. You can move directly from experimental activities to practice based on the calculations performed. Computations54 is used for critical calculations where confidence in the reliability of the results obtained is necessary.

Since the framework is based on the Python programming language, it allows you to take advantage of all the benefits of Python and all the machine learning and data science libraries available in Python, such as:

  • Pandas
  • Open CV
  • Tesseract
  • Keras
  • NLTK and many others.

JUPYTER IS VERY GOOD! BUT WHAT CAN'T JUPYTER CAN'T?

Jupyter is an absolutely amazing tool for drafting calculations and experiments. But Jupyter cannot be used in real systems, especially in such critical things as stock calculations and medical analytics, due to the fact that a mistake in configuring the system can lead to catastrophic results and its reliability leaves much to be desired.

Jupyter is suitable for experimental work, and in Computations54 you can run stable calculations and be confident in the results. Computations54 is a framework that allows you to bring experiments that were conducted in Jupyter into production.

KEY FEATURES AND BENEFITS OF COMPUTATIONS54

computation54 text mining

The main function of Computations54 is the intelligent processing and analysis of large volumes of data with the subsequent identification of patterns and trends that will allow making the right management decisions.

Computations54 processes various types of data: text, numeric and others. By converting data into a structured form, the service will make any information available for analysis. You only need to load the data into Computations54, and it will begin automatically processing it: structuring, sorting, analyzing and building a forecast. Computations54 supports different computation methods and different approaches to performing analysis.

Much attention was paid to the convenience of development, that is, it is possible to discuss Computations, issue tasks, and save results.

perform calculations

Computations54 Advantages:

  1. High customization potential and scalability. You can customize calculation methods for specific tasks, analyze and structure huge volumes of text or numeric data, and output the processed information in the format of your choice.
  2. Multiprocessing system - that is, if 50,000 files are processed in 40 hours, with multiprocessing these files will be processed in parallel in 5 threads, which will significantly save time resources.
  3. Internal monitoring, which allows you to analyze the reliability of the results.
  4. Several levels of logic.
  5. Adaptation for different types of calculations (GPU, intel MKL).
  6. Cluster computing with an extensible cluster.
  7. Ultimate programming flexibility.
  8. Data integrity control.
  9. The ability to create control calculations that the system will be guided by.
  10. The ability to clone Computations and create various solutions based on them.
  11. Built-in function with GPU calculation capabilities.
  12. Connect multiple databases from different sources.
  13. The ability for Computations to interact with each other via API and build cascades.
  14. Creation of control computations for building complex computing systems and division of commands.

computation monitoring

Using Computations54 minimizes the influence of the human factor on company processes, therefore, the likelihood of errors is significantly reduced.

ALTERNATIVE TO APACHE AIRFLOW

Computations54 is an alternative to Apache Airflow and can be used to process any data within an organization. With Computations54, you can design, schedule, and monitor complex workflows much more dynamically and quickly than AirFlow.

An extremely important feature is the ability to run graph data analysis and create workflows in the form of directed acyclic graphs (DAG) tasks, as in Apache Airflow.

directed acyclic graph

Computations54 at this stage only supports Python, but the speed of task execution is higher than in AirFlow due to the optimized calculation controller. Therefore, our framework is suitable for very fast, changeable, including metaprogrammable processes, while Apache Airflow is suitable for much more static tasks.

CONCLUSIONS AND AREAS OF APPLICATION

Big data mining can be used in a variety of industries, so Computations54 has a wide range of applications. This service can be used in any activity where it is necessary to analyze data, make calculations of any complexity and make forecasts based on this.

Computations54 is used in investment funds that engage in exchange trading. 
Computations54 also used Business School named after. Boothin Chicago to analyze and identify problems in banks based on open reporting.

Healthcare organizations store and generate large amounts of data containing important patient information in structured and unstructured forms. And with the help of intelligent processing of these data sets, it is possible to increase the efficiency of medical care by automating the solution of various tasks (predicting the development of diseases, making a diagnosis, prescribing treatment, etc.). 
Computations54 can also be used in economic research, business, banking, and real estate valuation.

Using Computations54 gives you complete confidence in the final result of your calculations. Our tool is not only designed for scientific experiments, it has also proven its effectiveness in practical applications.