Browse Econ Literature

  • Working papers
  • Software components
  • Book chapters
  • JEL classification

More features

  • Subscribe to new research

RePEc Biblio

Author registration.

  • Economics Virtual Seminar Calendar NEW!

IDEAS home

Some searches may not work properly. We apologize for the inconvenience.

Financial statement analysis: a review and current issues

  • Author & abstract
  • 2 Citations
  • Related works & more

Corrections

  • Andrew B. Jackson

Suggested Citation

Download full text from publisher.

Follow serials, authors, keywords & more

Public profiles for Economics researchers

Various research rankings in Economics

RePEc Genealogy

Who was a student of whom, using RePEc

Curated articles & papers on economics topics

Upload your paper to be listed on RePEc and IDEAS

New papers by email

Subscribe to new additions to RePEc

EconAcademics

Blog aggregator for economics research

Cases of plagiarism in Economics

About RePEc

Initiative for open bibliographies in Economics

News about RePEc

Questions about IDEAS and RePEc

RePEc volunteers

Participating archives

Publishers indexing in RePEc

Privacy statement

Found an error or omission?

Opportunities to help RePEc

Get papers listed

Have your research listed on RePEc

Open a RePEc archive

Have your institution's/publisher's output listed on RePEc

Get RePEc data

Use data assembled by RePEc

A Literature Review of Financial Performance Measures and Value Relevance

  • Conference paper
  • First Online: 30 December 2017
  • Cite this conference paper

literature review of financial analysis statement

  • Nattarinee Kopecká 2  

Part of the book series: Springer Proceedings in Business and Economics ((SPBE))

2020 Accesses

3 Citations

Performance measurement comprises several metrics and applications used as a benchmark in business sectors for both internal and external users. For managers, it expresses whether company’s targets are reached and as a way of evaluating risks and returns for shareholders. A variety of performance measures are utilized to almost every operational process, and the area is rather vast. Therefore, the aim of the study is to find out what kinds of financial tools are better linked to market value. The result of the study shows that financial measures appear to be favorable measures for companies providing relevant and meaningful information to shareholders. Especially, return on investment (ROI) and earnings are significantly relevant to market value, while the superiority of EVA still remains unclear. Above all, companies still prefer traditional financial measures to other financial tools.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Similar content being viewed by others

literature review of financial analysis statement

On the Examination of Value-Based Performance Measures: Evidence from Indian Firms

literature review of financial analysis statement

The Role of Financial Analytics in Decision-Making for Better Firm Performance

literature review of financial analysis statement

The relationship of financial performance and stock returns in countries under economic sanctions

Alibad S, Dorestani A, Balsara N (2013) The most value relevant accounting performance measure by industry. J Account Finance. http://www.na-businesspress.com . Accessed 16 Jul 2016

Almasan AC, Grosu C (2010) Financial measures for performance measurements in a regulated environment. Paper presented at the 5th international conference on economy and management transformation, West University of Timisoara, Romania, 24–26 Oct 2010

Google Scholar  

Barney JB (2002) Gaining and sustaining competitive advantage. Prentice Hall, Upper Saddle River

Black AP, Wright PD, Bachman JE (1998) In search of shareholder value: managing the drivers of performance. Financial Times Prentice Hall, London

Barton J, Hansen B, Pownall G (2010) Which performance measures do investor value the most and why? Account Rev 85:18–19

Article   Google Scholar  

Bhasin L (2016) Disclosure of EVA in the financial statement: experience of an asian economy. https://www.academia.edu . Accessed 22 Sept 2016

Ewoh AIE (2011) Performance measurement in an era of new public management. http://digitalcommons.kennesaw.edu . Accessed 10 Jul 2016

Francis J, Schipper K, Vincent L (2003) The relative and incremental explanatory power of earnings and alternative (to earnings) performance measures for returns. Contemp Account Res 1:121–164. https://doi.org/10.1506/XVQV-NQ4A-08EX-FC8A

Gentry RJ, Shen W (2010) The relationship between accounting and market measures of firm financial performance : how strong is it? J Manag Issues 22:514–530

Holthausen RW, Watts RL (2001) The relevance of the value-relevance literature for financial accounting standard setting. J Account Econ 31:3–75

Kaplan R, Norton D (1992) The balanced scorecard: measures that drive performance, vol 70. Harvard Business Reviews Press, Boston, pp 71–79

Kamath GB (2015) The impact of intellectual capital on financial performance and market valuation of firm in India. International Letters of Social and Humanistic Sciences. http://wwwscipress.com/ILSHS.48.107 . Accessed 18 Sept 2016

Knáplová A, Pavelková D, Chodúr M (2011) Měření a říyení výkonnosti podniku, Praha

Neely A, Mills J, Platts K, Gregory M, Richards H (1994) Realizing strategy through measurement. Int J Oper Prod Manag 14:52–140

Neely A, Gregory M, Platts K (2005) Performance measurement system design. A literature review and research agenda. Int J Oper Prod Manag 15:80–166

Patel RP, Patel M (2012) Impact of EVA on share price. International Journal of Contemporary Business Studies. A Study of Indian Private Sector Banks. https://ssm.com/abstract=2097467 . Accessed 12 Jul 2016

Pathirawasam C (2010) Value relevance of accounting information: evidence from Sri Lanka. Int J Res Commer Manag 8(1):13–20

Richard PJ, Devinney TM, Yip GS, Johnson G (2009) Measuring organizational performance: towards methodological best practice. J Manag. http://jom.sagepub.com/cgi/content/refs/35/3/718 . Accessed 20 Sept 2016

Shan YG (2014) Value relevance, earning management and corporate governance in China. http://www.business.adelaide.edu.au . Accessed 25 Jul 2016

Stewart GB (1994) EVA: fact and fantasy. J Appl Corp Financ 7(2):71–87. https://doi.org/10.1111/j.1745-622.1994.tb00406x

Sorter GH, Gans MS, Rosenfield P, Shannon RM, Streit RG (1974) Objectives of financial statements. American Institute of Certified Public Accountants, New York, pp 3–66

Venkatraman N, Ramanujam V (1986) Measurement of business performance in strategy research: a comparison of approaches. Acad Manag Rev 11:801–814

Download references

Acknowledgments

This paper has been dedicated to the research project "Analysis of strategic management ac-counting relation to company management and performance" (supported by Internal Grant Agency, No. IG 71/2017).

Author information

Authors and affiliations.

Department of Management Accounting, University of Economic, Prague, Nam. W. Churchilla 4, 130 67, Prague 3, Czech Republic

Nattarinee Kopecká

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Nattarinee Kopecká .

Editor information

Editors and affiliations.

Faculty of Finance and Accounting, University of Economics, Prague, Prague, Czech Republic

David Procházka

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper.

Kopecká, N. (2018). A Literature Review of Financial Performance Measures and Value Relevance. In: Procházka, D. (eds) The Impact of Globalization on International Finance and Accounting. Springer Proceedings in Business and Economics. Springer, Cham. https://doi.org/10.1007/978-3-319-68762-9_42

Download citation

DOI : https://doi.org/10.1007/978-3-319-68762-9_42

Published : 30 December 2017

Publisher Name : Springer, Cham

Print ISBN : 978-3-319-68761-2

Online ISBN : 978-3-319-68762-9

eBook Packages : Economics and Finance Economics and Finance (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  •  Sign into My Research
  •  Create My Research Account
  • Company Website
  • Our Products
  • About Dissertations
  • Español (España)
  • Support Center

Select language

  • Bahasa Indonesia
  • Português (Brasil)
  • Português (Portugal)

Welcome to My Research!

You may have access to the free features available through My Research. You can save searches, save documents, create alerts and more. Please log in through your library or institution to check if you have access.

Welcome to My Research!

Translate this article into 20 different languages!

If you log in through your library or institution you might have access to this article in multiple languages.

Translate this article into 20 different languages!

Get access to 20+ different citations styles

Styles include MLA, APA, Chicago and many more. This feature may be available for free if you log in through your library or institution.

Get access to 20+ different citations styles

Looking for a PDF of this document?

You may have access to it for free by logging in through your library or institution.

Looking for a PDF of this document?

Want to save this document?

You may have access to different export options including Google Drive and Microsoft OneDrive and citation management tools like RefWorks and EasyBib. Try logging in through your library or institution to get access to these tools.

Want to save this document?

  • More like this
  • Preview Available
  • Scholarly Journal

Financial statement analysis: a review and current issues

Publisher logo. Links to publisher website, opened in a new window.

No items selected

Please select one or more items.

Select results items first to use the cite, email, save, and export options

You might have access to the full article...

Try and log in through your institution to see if they have access to the full text.

Content area

1. Introduction

In this paper I review the trends in the literature on financial statement analysis (FSA), and provide insights into the relevance of FSA research in emerging trends. FSA research is generally concerned with two key issues – improving fundamental analysis and identifying market inefficiencies with respect to financial statement information (Yohn, 2020). Improving fundamental analysis is important in order to improve forecasts of profitability and more accurate estimates of firm value. The identification of market inefficiencies is generally within the realm of security equity analysts and quantitative funds that use certain firm or stock characteristics to select hedge portfolios in an attempt to beat the market.

In my survey of the literature, I focus on accounting journals [1] with a keyword search of “financial statement analysis”, “fundamental analysis”, and “forecast*” [2] on Web Of Science. A substantial number of articles from this search, particularly from the “forecast*” search term are related to topics such as analyst and management earnings forecasts, but not directly related to FSA research. Similarly, I exclude a number of papers on valuation and cost of capital which are not directly related to utilising financial statement information to identify market inefficiencies. After manually sorting through the 879 search results, I identify 79 papers that are directly related to what would traditionally be considered as FSA research [3]. This highlights the commentary from Yohn (2020) that this is a stream of research in which relatively few academics have been involved in. The upshot from this, however, is that it also illustrates it is an area with vast opportunities for future research.

An important part of fundamental analysis is the use of a systematic forecasting procedure to estimate firm value. There are alternate methods, but three dominate: the discounted dividend model, discounted cash flow model, and the residual income model. Under certain assumptions, it can be shown that all three methods will provide identical valuations. The residual income model, introduced by Ohlson (1995), has a focus of value created by the firm, with no consideration of whether it is distributed back to owners, or whether it consists of cash. The usefulness of the residual income model to FSA research is that it expresses value based on financial statement summary measures. This provides...

You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer

Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer

Suggested sources

  • About ProQuest
  • Terms of Use
  • Privacy Policy
  • Cookie Policy
  • Search Search Please fill out this field.
  • Financial Statement Analysis
  • How It Works

Types of Financial Statements

Financial performance.

  • Financial Statement Analysis FAQs
  • Corporate Finance
  • Financial statements: Balance, income, cash flow, and equity

Financial Statement Analysis: How It’s Done, by Statement Type

literature review of financial analysis statement

Katrina Ávila Munichiello is an experienced editor, writer, fact-checker, and proofreader with more than fourteen years of experience working with print and online publications.

literature review of financial analysis statement

What Is Financial Statement Analysis?

Financial statement analysis is the process of analyzing a company’s financial statements for decision-making purposes. External stakeholders use it to understand the overall health of an organization and to evaluate financial performance and business value. Internal constituents use it as a monitoring tool for managing the finances.

Key Takeaways

  • Financial statement analysis is used by internal and external stakeholders to evaluate business performance and value.
  • Financial accounting calls for all companies to create a balance sheet, income statement, and cash flow statement, which form the basis for financial statement analysis.
  • Horizontal, vertical, and ratio analysis are three techniques that analysts use when analyzing financial statements.

Jiaqi Zhou / Investopedia

How to Analyze Financial Statements

The financial statements of a company record important financial data on every aspect of a business’s activities. As such, they can be evaluated on the basis of past, current, and projected performance.

In general, financial statements are centered around generally accepted accounting principles (GAAP) in the United States. These principles require a company to create and maintain three main financial statements: the balance sheet, the income statement, and the cash flow statement. Public companies have stricter standards for financial statement reporting. Public companies must follow GAAP, which requires accrual accounting. Private companies have greater flexibility in their financial statement preparation and have the option to use either accrual or cash accounting.

Several techniques are commonly used as part of financial statement analysis. Three of the most important techniques are horizontal analysis , vertical analysis , and ratio analysis . Horizontal analysis compares data horizontally, by analyzing values of line items across two or more years. Vertical analysis looks at the vertical effects that line items have on other parts of the business and the business’s proportions. Ratio analysis uses important ratio metrics to calculate statistical relationships.

Companies use the balance sheet, income statement, and cash flow statement to manage the operations of their business and to provide transparency to their stakeholders. All three statements are interconnected and create different views of a company’s activities and performance.

Balance Sheet

The balance sheet is a report of a company’s financial worth in terms of book value. It is broken into three parts to include a company’s assets ,  liabilities , and  shareholder equity . Short-term assets such as cash and accounts receivable can tell a lot about a company’s operational efficiency; liabilities include the company’s expense arrangements and the debt capital it is paying off; and shareholder equity includes details on equity capital investments and retained earnings from periodic net income. The balance sheet must balance assets and liabilities to equal shareholder equity. This figure is considered a company’s book value and serves as an important performance metric that increases or decreases with the financial activities of a company.

Income Statement

The income statement breaks down the revenue that a company earns against the expenses involved in its business to provide a bottom line, meaning the net profit or loss. The income statement is broken into three parts that help to analyze business efficiency at three different points. It begins with revenue and the direct costs associated with revenue to identify gross profit . It then moves to operating profit , which subtracts indirect expenses like marketing costs, general costs, and depreciation. Finally, after deducting interest and taxes, the net income is reached.

Basic analysis of the income statement usually involves the calculation of gross profit margin, operating profit margin, and net profit margin, which each divide profit by revenue. Profit margin helps to show where company costs are low or high at different points of the operations.

Cash Flow Statement

The cash flow statement provides an overview of the company’s cash flows from operating activities, investing activities, and financing activities. Net income is carried over to the cash flow statement, where it is included as the top line item for operating activities. Like its title, investing activities include cash flows involved with firm-wide investments. The financing activities section includes cash flow from both debt and equity financing. The bottom line shows how much cash a company has available.

Free Cash Flow and Other Valuation Statements

Companies and analysts also use free cash flow statements and other valuation statements to analyze the value of a company . Free cash flow statements arrive at a net present value by discounting the free cash flow that a company is estimated to generate over time. Private companies may keep a valuation statement as they progress toward potentially going public.

Financial statements are maintained by companies daily and used internally for business management. In general, both internal and external stakeholders use the same corporate finance methodologies for maintaining business activities and evaluating overall financial performance .

When doing comprehensive financial statement analysis, analysts typically use multiple years of data to facilitate horizontal analysis. Each financial statement is also analyzed with vertical analysis to understand how different categories of the statement are influencing results. Finally, ratio analysis can be used to isolate some performance metrics in each statement and bring together data points across statements collectively.

Below is a breakdown of some of the most common ratio metrics:

  • Balance sheet : This includes asset turnover, quick ratio, receivables turnover, days to sales, debt to assets, and debt to equity.
  • Income statement : This includes gross profit margin, operating profit margin, net profit margin, tax ratio efficiency, and interest coverage.
  • Cash flow : This includes cash and earnings before interest, taxes, depreciation, and amortization (EBITDA) . These metrics may be shown on a per-share basis.
  • Comprehensive : This includes return on assets (ROA) and return on equity (ROE) , along with DuPont analysis .

What are the advantages of financial statement analysis?

The main point of financial statement analysis is to evaluate a company’s performance or value through a company’s balance sheet, income statement, or statement of cash flows. By using a number of techniques, such as horizontal, vertical, or ratio analysis, investors may develop a more nuanced picture of a company’s financial profile.

What are the different types of financial statement analysis?

Most often, analysts will use three main techniques for analyzing a company’s financial statements.

First, horizontal analysis involves comparing historical data. Usually, the purpose of horizontal analysis is to detect growth trends across different time periods.

Second, vertical analysis compares items on a financial statement in relation to each other. For instance, an expense item could be expressed as a percentage of company sales.

Finally, ratio analysis, a central part of fundamental equity analysis, compares line-item data. Price-to-earnings (P/E) ratios, earnings per share, or dividend yield are examples of ratio analysis.

What is an example of financial statement analysis?

An analyst may first look at a number of ratios on a company’s income statement to determine how efficiently it generates profits and shareholder value. For instance, gross profit margin will show the difference between revenues and the cost of goods sold. If the company has a higher gross profit margin than its competitors, this may indicate a positive sign for the company. At the same time, the analyst may observe that the gross profit margin has been increasing over nine fiscal periods, applying a horizontal analysis to the company’s operating trends.

Congressional Research Service. “ Cash Versus Accrual Basis of Accounting: An Introduction ,” Page 3 (Page 7 of PDF).

Internal Revenue Service. “ Publication 538 (01/2022), Accounting Periods and Methods: Methods You Can Use. ”

literature review of financial analysis statement

  • Terms of Service
  • Editorial Policy
  • Privacy Policy
  • DOI: 10.36713/epra18013
  • Corpus ID: 272073166

AN EXAMINATION OF FINANCIAL PERFORMANCE: A REVIEW STUDY

  • Ghata H. Shah , Dr. Dipesh Patel
  • Published in EPRA international journal of… 19 August 2024
  • Business, Economics

9 References

Using dupont analysis to assess the financial performance of the selected companies in the plastic industry in india, financial performance of selected cement companies in india, financial performance of palestinian commercial banks, comparative analysis of financial performance of commercial banks in tanzania, related papers.

Showing 1 through 3 of 0 Related Papers

Financial Statement Analysis of ITC Limited

18 Pages Posted: 13 Jul 2020

Symbiosis College of Arts and Commerce

Date Written: June 19, 2020

Financial statements can say a lot about a company’s financial health and earning potential. These statements are analyzed and reviewed for decision making purposes, this process is known as financial analysis. Financial analysis help the stakeholders to assess the financial performance of an organization which helps them in making good investment decisions. This paper provides a detailed financial analysis of ITC Ltd with an attempt to assess the company’s efficiency and performance.The study has focused on past and present performance of ITC Ltd over the period of five years for analyzing the trends.

Keywords: Financial Analysis, Ratio analysis, DuPont analysis

Suggested Citation: Suggested Citation

Neha Rawat (Contact Author)

Symbiosis college of arts and commerce ( email ), do you have a job opening that you would like to promote on ssrn, paper statistics, related ejournals, finance education ejournal.

Subscribe to this fee journal for more curated articles on this topic

Decision-Making in Economics & Finance eJournal

Financial & investment planning educator ejournal.

Two decades of financial statement fraud detection literature review; combination of bibliometric analysis and topic modeling approach

Journal of Financial Crime

ISSN : 1359-0790

Article publication date: 19 April 2023

Issue publication date: 30 November 2023

The emergence of machine learning has opened a new way for researchers. It allows them to supplement the traditional manual methods for conducting a literature review and turning it into smart literature. This study aims to present a framework for incorporating machine learning into financial statement fraud (FSF) literature analysis. This framework facilitates the analysis of a large amount of literature to show the trend of the field and identify the most productive authors, journals and potential areas for future research.

Design/methodology/approach

In this study, a framework was introduced that merges bibliometric analysis techniques such as word frequency, co-word analysis and coauthorship analysis with the Latent Dirichlet Allocation topic modeling approach. This framework was used to uncover subtopics from 20 years of financial fraud research articles. Furthermore, the hierarchical clustering method was used on selected subtopics to demonstrate the primary contexts in the literature on FSF.

This study has contributed to the literature in two ways. First, this study has determined the top journals, articles, countries and keywords based on various bibliometric metrics. Second, using topic modeling and then hierarchy clustering, this study demonstrates the four primary contexts in FSF detection.

Research limitations/implications

In this study, the authors tried to comprehensively view the studies related to financial fraud conducted over two decades. However, this research has limitations that can be an opportunity for future researchers. The first limitation is due to language bias. This study has focused on English language articles, so it is suggested that other researchers consider other languages as well. The second limitation is caused by citation bias. In this study, the authors tried to show the top articles based on the citation criteria. However, judging based on citation alone can be misleading. Therefore, this study suggests that the researchers consider other measures to check the citation quality and assess the studies’ precision by applying meta-analysis.

Originality/value

Despite the popularity of bibliometric analysis and topic modeling, there have been limited efforts to use machine learning for literature review. This novel approach of using hierarchical clustering on topic modeling results enable us to uncover four primary contexts. Furthermore, this method allowed us to show the keywords of each context and highlight significant articles within each context.

  • Detection of financial fraud
  • Latent Dirichlet allocation
  • Topic modeling
  • Smart literature review
  • Bibliometric analysis

Soltani, M. , Kythreotis, A. and Roshanpoor, A. (2023), "Two decades of financial statement fraud detection literature review; combination of bibliometric analysis and topic modeling approach", Journal of Financial Crime , Vol. 30 No. 5, pp. 1367-1388. https://doi.org/10.1108/JFC-09-2022-0227

Emerald Publishing Limited

Copyright © 2023, Milad Soltani, Alexios Kythreotis and Arash Roshanpoor.

Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial & non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode

1. Introduction

1.1 description of the research topic.

Almost 20 years have passed since the financial crisis caused by Enron’s financial fraud. Despite significant reforms such as the Sarbanes Oxley Act and the creation of the Public Company Accounting Oversight Board, which oversees auditors’ quality, the news of managers cooking the books continues. Even the legendary investor, the Oracle of Omaha, has fallen victim to financial fraud ( Kollewe, 2014 ). To combat this issue, experts suggest financial insurance ( Economist, 2014 ) and the use of blockchain technology to revolutionize financial transactions ( Aicpa.org, 2020 ). Providing these examples, one may ask, what are the causes of the financial fraud cycle? How can investors protect their capital by examining the quality of their financial statements? Committing financial statement fraud (FSF) is an unethical decision-making process, and detecting fraud is challenging because of its multifaceted nature. For example, 50% of fraud seen by SEC enforcement has been due to revenue recognition (e.g. cookie jar), and 35% is because of concealed liabilities and expenses. The rest is caused by inadequate disclosure in footnotes ( Association of Certified Fraud Examiners, 2017 ).

Uncovering the new trend of articles by analyzing the keywords over two decades;

Grouping of the authors based upon the similarity of the content of articles;

Providing the top countries' and journals' collaboration in FSF detection literature; and

Explaining the primary contexts of FSF literature and providing the high-impact articles in each content based on citation rate.

Therefore, the results of this study will appeal to those interested in comprehensively exploring the FSF detection literature, such as auditors, investors, financial managers and researchers.

2. Literature review

2.1. a review of previous research.

In this article, we use a combination of two approaches, bibliometric analysis and topic modeling, to review the research literature on detecting financial fraud. Bibliometric analysis is a quantitative method that we use to analyze a large number of financial fraud studies and uncover emerging trends in FSF detection. This approach is in line with the guideline presented in the article by Donthu et al. in 2021, and the software we use is VOSviewer. Table 1 shows the conditions required to use bibliometric analysis.

In the second approach, we will use topic modeling, which aligns with the guideline presented in Asmussen and Møller’s article in 2019. Topic modeling is a natural language processing technique of the unsupervised method that extracts topics from a collection of papers. For topic modeling, we use Python, and the essential Python libraries used in the topic modeling approach include Pemdas, Gensim and pyLDAvis. Table 2 summarizes the features of the topic modeling approach.

Table 3 shows five studies using bibliometric analysis and topic modeling formatted based on the research field, methodology, required data and data size.

2.2 Research innovation

Figure 1 shows interest over time based on Google Trends data [ 1 ] for the two approaches of bibliometric analysis and topic modeling over the past five years. As is shown, despite the increasing trend for bibliometric analysis, interest in topic modeling has been more significant.

What has been the trend in FSF detection article publishing from 2001 to 2021?

Which countries have the highest number of published articles in the field of FSF detection?

How do the leading nations collaborate in publishing articles related to FSF detection?

Which journals have the highest publication rates for articles on FSF detection?

What has been the trend of the top journals in the FSF detection field from 2001 to 2021?

Which are the most frequently used keywords in FSF detection articles based on co-occurrence measurement?

What has been the trend of the top keywords in the field of FSF detection from 2001 to 2021?

What are the main contexts in FSF detection and which keywords have a higher frequency within these contexts?

Which articles have the highest citation rate in each context related to FSF detection?

3. Methodology

The methodology of this study is illustrated in Figure 2 . Essentially, the research process encompasses three phases. The initial phase, referred to as pre-processing, entails the provision of the relevant data from articles in an Excel format, followed by data cleansing. The next stage is topic modeling, where the LDA method is applied. The clustering method is then used to identify contexts associated with financial fraud detection. Finally, the post-processing stage highlights the most relevant articles for the selected contexts:

Loading papers: Generally, two primary databases for collecting scholarly publications include Web of Science (WoS) and Scopus. While WoS is one of the largest and most reliable databases for reviewing research literature, the journals in Scopus are more comprehensive ( Agarwal, 2016 ; Saba et al. , 2020 ). Therefore, we have used the Scopus database to find relevant articles.

Selecting papers: Distinguishing between fraud detection and FSF detection is necessary to collect the relevant articles. To investigate FSF detection, we use the keywords: “anomaly detection financial*” OR “detect fraudulent financial reporting*” OR “detect accounting fraud*” OR “detect misstatements*” OR “financial ratio risk detection*” OR “detect earning manipulation*” for the years 2001 to 2021. The total number of articles is 1,496 articles. Then, 124 articles that are not related to detecting financial fraud are removed. Also, considering that one of the goals of this study is to categorize research based on author keywords into corresponding contexts, we excluded articles that did not have author keywords. As a result, by removing 296 articles, we selected 1,076 articles suitable for analysis.

Descriptive analysis: In this study, we have used VOSviewer and Phyton software to implement three methods of bibliometric analysis, including co-word analysis, co-bibliometric analysis and co-authorship analysis. Co-word analysis shows that keywords related to each other have the same thematic map. Co-authorship analysis shows the interaction between scholars in detecting financial fraud, such as affiliated institutions and countries. Bibliographic coupling assumes that two publications sharing common references are likely to have a higher similarity.

Cleaning documents: In this step, we break down the keywords into individual units and convert them to lowercase. Additionally, we eliminate any non-alphabetic symbols (such as punctuation marks) and words with less than three characters, as they have minimal impact on the topics being analyzed. Subsequently, we select keywords that do not contain stop words such as “use.” Then, using a process called lemmatization, we identify the root words of the keywords. Finally, we eliminate author keywords that only occur once, as they offer limited value in performing topic modeling.

Vectorization: Following the document cleaning process, we compile a list of keywords indicating the frequency at which each keyword appears in the training set. Subsequently, with the use of the count vectorizer, we convert each keyword into a vector representation.

Setting parameters for LDA: The LDA method has a default value set. In this study, we have changed the parameters of burn-in time, the number of iterations and seed values to achieve better results. In addition, the number of fold parameters has been removed since all the papers are used to run the model.

Topic Modeling: In this study, we used the LDA method, a probabilistic and unsupervised modeling approach. This method allows us to categorize texts within a corpus into specific topics. It is vital at this stage to determine the optimal number of topics through the use of the topic coherence metric. The coherence metric demonstrates how well a topic is supported by the reference corpus (Joao Pedro, 2022 ). The optimal number of topics for the model can be chosen by visualizing the coherence score, which ranges from 0 to 1 on the vertical axis, with the horizontal axis representing the number of topics. The highest point on the graph indicates the ideal number of topics ( Yadav, 2022 ).

LDA model outputs: The result produced by LDA are the LDA components that demonstrate the significance of each keyword in relation to various topics. The LDA-matrix shows the significance of articles to each topic based on the keyword list used in the research papers. To clarify the outcome, an inter topic distance map has been used which visualizes the value of keywords in different topics based on their frequency of occurrence ( Yadav, 2022 ).

Hierarchical clustering: We use hierarchical clustering to categorize such as topics into a single cluster. Clusters featuring similar topics create a context that is linked to articles. Afterwards, by introducing a new column, we demonstrate the affiliation of each article to its corresponding cluster.

Selecting the relevant topics: Once the membership of articles within each cluster is established, it becomes imperative to assign labels to each cluster. These labels signify the context of articles that share similar themes. The determination of the labels is based on the occurrence and recognition of significant keywords. Hence, we use the word cloud visualization approach which displays the frequency of keywords in each context.

Validating the results: The exploratory search ends with labeling the contexts, and it is necessary to check the validity of the results. Validity shows the extent to which the contexts’ labeling reflects the collection of keywords in each context. The validity of the results in this study is investigated through semantic validation, in which experts review the keywords in the word cloud to see if the naming of the contexts based on the observed keywords is justified ( Asmussen and Møller, 2019 ). For this purpose, we will ask three finance and financial fraud experts to express their agreement with our labeling. We will also use the following formula to check the overall level of agreement or disagreement ( Neuman, 2013 ): (Eqn.1) V a l i d i t y   P e r c e n t a g e = 2 × n u m b e r   o f   a g r e e m e n t s T o t a l   n u m b e r   o f   c o d e s × 100

Introducing articles with the highest citation rate: In the last step, the top articles in each context are shown based on their citation rates. The citation rate of each article is calculated by dividing the total citations by the number of years that have passed since its publication. This analysis helps us to introduce the most important research topics within the field of study.

4. Analysis of publications

4.1 descriptive details of extant publications, 4.1.1 articles published each year (rq1)..

Figure 3 shows the frequency of the articles published from 2001 to 2021. The publication trend shows that researchers’ interest in issues related to financial fraud detection has been increasing. We have demonstrated this trend with important events related to financial fraud and global markets. For instance, in 2002, Enron and WorldCom went bankrupt due to FSF. One of the consequences of this bankruptcy was the publication of Statement of Auditing Standards No. 99 (SAS No. 99) by the American Institute of Certified Public Accountants. Also, in 2008 the world faced the most significant financial crisis, which put considerable pressure on companies for financing.

4.1.2 Density visualization of top 20 countries collaborating in the detection of financial fraud (RQ2).

Figure 4 illustrates the top 20 countries that have published articles on FSF. The countries included in the data have published at least 14 articles on the topic. The United States of America and China are the leading countries in terms of publishing articles on FSF, with 248 and 134 articles respectively. This suggests that financial fraud is a prevalent issue worldwide and that various countries have contributed to the research in this field.

4.1.3 Network structure of the top 20 countries (RQ3).

Figure 5 shows the collaboration between top 20 countries in publishing scholarly papers on FSF detection based on authors' affiliation information. The size of the nodes represents the number of published articles and the width of the edges represents the cooperation between countries. The USA, the UK and China have the largest co-authorships while countries with limited cooperation rely on local or individual authors.

4.1.4 Total number of published articles in each journal (RQ4).

Figure 6 shows journals that have published more than 10 articles on FSF detection.

As demonstrated, Lecture Notes in Computer Science with 35 articles, ACM International Conference Proceeding Series with 26 articles and Journal of Financial Crime with 15 articles are among the top journals in FSF detection.

4.1.5 Trend of journals in publishing articles (RQ5).

Figure 7 shows the trend of top journals publishing the highest number of articles annually.

As demonstrated, the journals Lecture Notes in Computer Science , ACM International Conference Proceeding Series , IEEE Access and Journal of Financial Crime have an increasing trend from 2001 to 2021.

4.1.6 Network structure of the top 50 keywords (RQ6).

Figure 8 demonstrates the network structure of the keywords based on the co-occurrence and shows how often each keyword is associated with other keywords. For this purpose, we set the minimum number of co-occurrence to 28, based on which 50 keywords with the highest co-occurrence are shown in Figure 8 . Different colors of the nodes represent the initial date of their publication.

The results in Figure 8 show that keywords such as big data, machine learning and decision tree are among the keywords that have recently attracted the attention of researchers. In contrast, keywords such as computer crime, administrative data processing and financial data processing have been used since 2012.

4.1.7 Keyword trend (RQ7).

Table 4 shows the top 20 keywords in FSF detection. It shows that keywords such as anomaly detection, fraud detection, machine learning and data mining have the highest frequency in FSF detection.

In the same line, Figure 9 shows the trend of six keywords based on the frequency of usage from 2001 to 2021. As it can be seen, the keywords of anomaly detection, machine learning and fraud detection have an upward trend, indicating the increasing attention they have received from researchers over the years.

4.2 Topic modeling using Latent Dirichlet Allocation

4.2.1 coherence scores..

The coherence score is used to determine the optimal number of topics in a reference corpus and was calculated for 100 possible topics. The score reached its maximum at 0.65, indicating that 42 topics are optimal. Figure 10 represents these 42 topics in a two-dimensional graph, with the intertopic distance map used to evaluate the content of each topic based on its keyword values.

4.2.2 Hierarchical clustering.

At this stage, hierarchical clustering is used to reveal the primary contexts related to identifying financial fraud by grouping similar topics. In this method, each article is initially considered a cluster, and at each stage, articles closer to each other form a larger cluster. To find the optimal number of clusters, we have used Dendrogram analysis, a prevalent hierarchical clustering method. The optimal number of clusters is identified in the largest vertical difference between nodes by crossing the horizontal line. Accordingly, the optimal number of clusters is four, and each article belongs to a specific cluster from one to four. Also, four clusters have been formed due to merging similar topics, so it can be concluded that the articles in each of the four clusters refer to different contexts of FSF detection.

Four contexts covering 10%, 80%, 6% and 4.6% of all published articles, respectively. Therefore, contexts two and four have the highest and the lowest number of articles, respectively. In context one, the journals Lecture Notes in Computer Science , Procedia Computer Science and ACM International Conference Proceeding Series have the highest published articles. Moreover, in context two, the journals Lecture Notes in Computer Science , ACM International Conference Proceeding Series , Audit and Journal of Financial Crime have the highest number of published articles. In context three, IEEE Access , Lecture Notes in Computer Science , Journal of Critical Reviews and Studies in Computational Intelligence have the highest published articles. Finally, in context four, the Managerial Auditing Journal , Issues in Accounting Education and the Journal of Financial Crime have the highest number of published articles. Therefore, Lecture Notes in Computer Science and the Journal of Financial Crime have covered various contexts of FSF detection.

4.3 Select relevant topics (RQ8)

The article analyzes the content of various topics based on the frequency of keywords. Out of 3,406 unique keywords, the most repeated keywords in the field of FSF detection from 2001 to 2021 include anomaly detection, fraud, data mining, deep learning audit, clustering audit, Benford law, outlier detection and machine learning. These keywords are shown in the word cloud graph in Figure 11 .

4.3.1 Selected relevant labels for context one.

In this part, we aim to select relevant labels based on the frequency of keywords in each context. Figure 12 shows the keywords leading to the creation of context one.

Two FSF detection methods are human detection and machine detection. The results in Figure 13 show that by applying topic modeling, the keywords related to machine detection are more similar and, therefore, are in context one. In other words, human detection, such as the Whistleblowing system, can be a useful tool in detecting FSF, as it can help to bring attention to fraudulent activity that might otherwise go undetected. However, human detection is not the only method of financial fraud detection, and it is not always a reliable or effective method on its own. Financial fraud detection often involves combining different techniques, such as data analysis and machine learning algorithms.

The first group of keywords consists of feature selection, principal component analysis, feature extraction and dimensionality reduction. Generally, the issues related to FSF detection include the study of extensive financial data where the identification of financial variables and financial ratios is necessary. Then, by applying data mining techniques, organizations are classified into two categories fraudulent and non-fraudulent. However, if the data set includes many irrelevant and correlated features, a curse of dimensionality will appear, reducing the classification’s performance. Therefore, removing the number of irrelevant features from the data set is necessary by using dimensionality reduction techniques such as feature selection and principal component analysis (see, for example, Gupta and Mehta, 2020 ).

The second group of keywords consists of text mining, neural network, classification, one class classification, artificial intelligence, time series analysis, graph mining, visual analytics, random forest, regression, unsupervised learning, decision tree, k -means, fuzzy logic, supervised learning, time series prediction and correlation. Generally, there are two supervised and unsupervised learning approaches for artificial intelligence and machine learning. In supervised learning, data sets have labels and include classification algorithms (e.g. support vector machine, decision tree and random forest) and regression algorithms (e.g. linear regression and logistic regression). The unsupervised learning approach analyzes unlabeled data sets and includes methods such as clustering and association (see, for example, Ashtiani and Raahemi, 2021 ). According to these explanations, it can be argued that the keywords of the first context are related to the title of “fraud detection techniques” for cluster one.

4.3.2 Selected relevant label for context two.

Figure 13 shows the frequency of keywords leading to the creation of context two.

Cluster two includes keywords such as fraudulent financial reporting, FSF, earning management, corporate governance, cooking the books, fraud prevention, fraud triangle, fraud diamond, the pentagon model. The bankruptcy of companies such as Enron and WorldCom increased attention to the quality of financial reports, and researchers began to investigate the causal factors associated with the increased probability of fraud and the consequences of financial fraud (see, for example, Rezaee, Z., 2005 ). Other studies have tried to investigate different proxies to measure fraud factors in the fraud triangle, diamond and pentagon model (see, for example, Skousen et al. , 2009 ). Based on the provided explanations, we select the title “Causes and deterrence of FSF” for Cluster 2.

4.3.3 Selected relevant label for context three.

Figure 14 shows the keywords leading to the creation of context three based on the frequency of keywords.

Cluster three includes the keywords of digital forensics, network security, wireless sensor network, cloud computing, data privacy, malware, DDoS, information security, cyber security, online transaction, website defacement attack. These keywords are related to transactions and frauds caused by computers and the internet, the disclosure of which leads to the loss of the intellectual property of the company secrets and major financial damage. Articles in this area can include a variety of topics related to digital forensics. Cyber security measures can be an important tool in preventing and detecting fraud in financial statements. For example, implementing strong access controls, monitoring financial transactions and maintaining accurate audit logs can help identify manipulation of financial data. Also, digital forensics can be used to analyze the digital evidence that has been collected, to find and recover hidden information. For example, studies focuses on data manipulation detection methods, which means the unauthorized modification of the system (e.g. data leakage malware, salami technique) to disrupt the normal function of the targeted system (see, for example, Nicholls et al. , 2021 ). Other study objectives can be identifying factors related to preventing online fraud and data security, such as security auditing and data classification (see, for example, Soomro et al. , 2021 ). Based on the provided explanations, we select the title “computer and online transaction fraud” for cluster three.

4.3.4 Selected relevant label for context four.

Figure 15 shows the keywords leading to the formation of context four based on the frequency criteria.

Context four includes the keywords of audit risk plan, audit difference, audit analytics, audit procedures, audit software, audit planning, audit risk, audit standards, auditor experience, auditor liability, audit adjustments, audit effort, external audit, auditor independence, audit sampling, audit evidence and audit committee effectiveness. In general, studies associated with auditors’ fraud-related responsibilities can be divided into two groups: internal audits and external audits. Internal auditors are better positioned to detect financial fraud due to their proximity and understanding ( Association of Certified Fraud Examiners, 2017 ). Therefore, the first group of studies has addressed the role of internal auditors in risk management (see, for example, Tamimi, 2021 ). In this regard, another group of studies has examined the function of internal auditors under the influence of mediating factors such as corporate governance and gender diversity (see, for example, Pazarskis et al. , 2021 ). The second group of studies is related to external auditors. For instance, some studies have investigated the effect of external auditor quality (e.g. the auditor's quality) on the possibility of identifying financial fraud (see, for example, Qawqzeh et al. , 2021 ). Other studies have investigated new trends in auditing financial statements (see, for example, Lim, 2021 ).

4.4 Validity test of the labeling for the contexts

At this stage, it is necessary to examine the validity of labeling for the four mentioned contexts. To this end, we provided the keywords of each topic to the financial experts and asked them to express their agreement with the keywords related to each context. If the calculated value of validating percentage is more than 60%, the created context will have sufficient validity ( Neuman, 2013 ). Table 5 shows the level of agreement of labeling based on the keywords of each context by three financial and financial fraud experts.

4.5 Introducing top articles for main contexts in financial statement fraud detection (RQ9)

In this section, the top articles of each context are introduced in Table 6 . For this purpose, the citation rate index is used, and the articles are sorted based on this index. Articles with context number one are related to fraud detection techniques. Articles with context number two are related to the causes and deterrence of FSF. Articles with context number three are related to computer and online transaction fraud. Articles with context number four are related to auditors’ fraud-related responsibilities.

5. Conclusion

This study aims to review the research literature concerning FSF detection from 2001 to 2021. Accordingly, 1,496 articles were extracted from the Scopus database. After screening the articles, we selected 1,076 papers for analysis. To analyze the literature on FSF detection, we first used the bibliometric approach and revealed that the articles published during the past two decades have an upward trend. We also indicated the top 20 countries with the highest number of articles published. The results showed that the USA and China are the leading countries in content production in FSF detection. Also, the USA and China coauthors cooperate more with other countries. Then, we reviewed the journals and identified the top journals with more than 10 articles during the past two decades. Our results also revealed that the journals of Lecture Notes in Computer Science , ACM International Conference Proceeding Series , IEEE Access and Journal of Financial Crime have an increasing trend in publishing content FSF detection. Then, we analyzed the keywords and showed that keywords such as decision trees, machine learning and big data have recently attracted researchers’ attention. Then, we introduced the 20 most frequent keywords in the literature on FSF detection. The analysis of the trends of the keywords such as anomaly detection, machine learning and fraud detection showed a growing trend in using these keywords in recent years.

Finally, using the LDA modeling method, 42 related topics were identified. Finally, we identified the contexts by applying hierarchical clustering. The examination of four clusters revealed that the Journal of Financial Crime and Lecture Notes in Computer Science include more diverse topics because of their presence in most contexts. Then, using word cloud, we displayed the keywords of each context and identified the four labels of fraud detection techniques, fraud prevention and deterrence, computer and online transaction fraud and auditors’ fraud-related responsibilities based on the analysis. Finally, we have introduced the top articles in each context label based on citation rate.

6. Limitation of the study

In this study, we tried to comprehensively view the studies related to financial fraud conducted over two decades. However, this research has limitations that can be an opportunity for future researchers. The first limitation is due to language bias. We have focused on English language articles, so it is suggested that other researchers consider other languages as well. The second limitation is caused by citation bias. In this study, we tried to show the top articles based on the citation criteria. However, judging based on citation alone can be misleading. Therefore, we suggest that the researchers consider other measures to check the citation quality and assess the studies’ precision by applying meta-analysis.

Trend of interest in bibliometric analysis and topic modelling based on Google Trends

Research methodology to analyze articles in financial statement fraud detection

Year-wise publication of detection financial fraud between 2001 and 2021

Top countries contributing to publishing financial statement fraud detection articles

Collaboration map of the top 20 countries in financial statement fraud detection

Top journals with the highest number of published articles

The trend of the top journals in the field of financial fraud detection annually

Network structure for the top 50 keywords concerning financial statement fraud detection

The trend of the number of keywords used over the years 2001–2021

Intertopic distance map for the 42 financial statement fraud detection topics

Visualization of 3,406 unique keywords in the field of financial statement fraud

Selecting the title of “fraud detection techniques” for context one

Selecting the title of “causes and deterrence of financial statement fraud” for context two

Selecting the title of “computer and online transaction fraud” for context three

Selecting the title of “auditors' fraud-related responsibilities” for context four

Requirements for using bibliometric analysis

Dataset Scope When not to use When to use Goal Methodology
Large Board When the high number of papers are heterogeneous or small paper is available When the scope of review is broad and data is large for manual review Summarizing a large number of articles and uncovering past and emerging trends Bibliometric

(2021)

Method Categories of papers are known? Coding can be automated? Person hours spent Person hours spent interesting results
Topic modeling No Yes Low Moderate

Reference Intended use Method Data requirement Size of data
( , 2022) Review of anaerobic digestion technology bibliometric analysis Academic literature 6,854 articles
( , 2021) Review of entrepreneurship and crisis literature bibliometric analysis Academic literature 1,044 articles
( , 2021) Sentic computing LDA and Bibliometric Academic literature 308 Articles
( , 2020) Review of AI research in marketing LDA and Scientometric analysis Academic literature 214 Articles
( , 2021) Information management LDA and conceptual structure analysis Academic literature 19,916 Articles
Author contribution

Index Author’s keywords Frequency
1 Anomaly detection 249
2 Fraud detection 98
3 Machine learning 62
4 Data mining 47
5 Audit 37
6 Principal component analysis 35
7 Earning management 23
8 Deep learning 21
9 Clustering 20
10 Corporate governance 16
11 Unsupervised learning 15
12 Benford’s law 15
13 Network security 13
14 Fraud triangle 12
15 Random forest 11
16 Feature extraction 11
17 Neural network 11
18 Audit quality 10
19 Decision tree 10
20 Internal control 10

Author contribution

Interview no. Title context Context no. The total no. of keywords in each cluster No. of approvals keywords Validity of labeling (%)
1 Fraud detection techniques 1 244 90 73
2 86 70
3 86 70
1 Causes and deterrence of financial statement fraud 2 2,834 885 62
2 865 61
3 870 61
1 Computer and online transaction fraud 3 170 70 82
2 65 76
3 70 82
1 Auditors’ fraud-related responsibilities 4 158 75 94
2 68 86
3 60 75

Author contribution

Context no. Citation Citation rate Title Author Year
2 678 96.85 Graph based anomaly detection and description: A survey Akoglu L., 2015
1 166 55.33 Real-time big data processing for anomaly detection: A Survey Habeeb R.A., 2019
2 32 32 An Integrated Cluster Detection, Optimization, and Interpretation Approach for Financial Data Li T., Kou G., Peng Y., Yu P.S. 2021
2 171 28.5 Intelligent financial fraud detection: A comprehensive review West J., Bhattacharya M. 2016
1 45 22.5 Detecting Accounting Fraud in Publicly Traded U.S. Firms Using a Machine Learning Approach Bao Y., 2020
2 154 15.4 The evolution of fraud theory Dorminey J., 2012
4 36 12 The role of audit in the fight against corruption Jeppesen 2019
4 67 9.57 Materiality guidance of the major public accounting firms Eilifsen A., Messier W.F., Jr. 2015
4 99 9 Financial statement fraud detection: An analysis of statistical and machine learning algorithms Perols J. 2011
1 169 8.89 Earnings Manipulation in Failing Firms Rosner R.L. 2003
1 53 8.83 Unsupervised learning for robust Bitcoin fraud detection Monamo P., 2016
3 26 8.66 Situ: Identifying and explaining suspicious behavior in networks Goodall J.R., 2019
3 33 8.25 Malware analysis and detection using data mining and machine learning classification Chowdhury M., 2018
4 96 8 The world has changed - Have analytical procedure practices? Trompeter G., Wright A. 2010
3 17 5 A flow-based approach for Trickbot banking trojan detection Gezer A., 2019
3 12 4 Stock Price Manipulation Detection using Generative Adversarial Networks Leangarun T., 2019

Source: Author contribution

https://trends.google.com/trends/

Agarwal , A. , Durairajanayagam , D. , Tatagari , S. , Esteves , S. , Harlev , A. , Henkel , R. , Roychoudhury , S. , Homa , S. , Puchalt , N. , Ramasamy , R. , Majzoub , A. , Ly , K. , Tvrda , E. , Assidi , M. , Kesari , K. , Sharma , R. , Banihani , S. , Ko , E. , Abu-Elmagd , M. and Gosalvez , J. ( 2016 ), “ Bibliometrics: tracking research impact by selecting the appropriate metrics ”, Asian Journal of Andrology , Vol. 18 No. 2 , p. 296 , ( online ), doi: 10.4103/1008-682x.171582 .

Aicpa.org ( 2020 ), “ Blockchain versus financial statement fraud ”, ( online ), available at: www.aicpa.org/professional-insights/download/blockchain-versus-financial-statement-fraud ( accessed 17 August 2022 ).

Ampese , L.C. , Sganzerla , W.G. , Di Domenico Ziero , H. , Mudhoo , A. , Martins , G. and Forster-Carneiro , T. ( 2022 ), “ Research progress, trends, and updates on anaerobic digestion technology: a bibliometric analysis ”, Journal of Cleaner Production , Vol. 331 , p. 130004 , ( online ), doi: 10.1016/j.jclepro.2021.130004 .

Ashtiani , M.N. and Raahemi , B. ( 2021 ), “ Intelligent fraud detection in financial statements using machine learning and data mining: a systematic literature review ”, IEEE Access , Vol. 10 , p. 1 , ( online ), doi: 10.1109/access.2021.3096799 .

Asmussen , C.B. and Møller , C. ( 2019 ), “ Smart literature review: a practical topic modeling approach to exploratory literature review ”, Journal of Big Data , Vol. 6 No. 1 , pp. 1 - 18 , ( online ), doi: 10.1186/s40537-019-0255-7 .

Association of Certified Fraud Examiners ( 2017 ), Fraud Examiner's Manual , Association of Certified Fraud Examiners , Austin, TX .

Chen , X. , Xie , H. , Cheng , G. and Li , Z. ( 2021 ), “ A decade of sentic computing: topic modeling and bibliometric analysis ”, Cognitive Computation , Vol. 14 No. 1 , pp. 24 - 47 , ( online ), doi: 10.1007/s12559-021-09861-6 .

Donthu , N. , Kumar , S. , Mukherjee , D. , Pandey , N. and Lim , W.M. ( 2021 ), “ How to conduct a bibliometric analysis: an overview and guidelines ”, Journal of Business Research , Vol. 133 , pp. 285 - 296 , ( online ), doi: 10.1016/j.jbusres.2021.04.070 .

Gupta , S. and Mehta , S.K. ( 2020 ), “ Feature selection for dimension reduction of financial data for detection of financial statement frauds context to Indian companies ”, Global Business Review , ( online ), doi: 10.1177/0972150920928663 .

Kollewe , J. ( 2014 ), “ Warren buffett: ‘tesco was a huge mistake’ ”, The Guardian , Retrieved from www.theguardian.com/business/2014/oct/02/warren-buffet-tesco-huge-mistake

Lim , F.X. ( 2021 ), “ Emerging technologies to detect fraud in audit testing: a perception of Malaysian big four auditors ”, SSRN Electronic Journal , ( online ), doi: 10.2139/ssrn.3877347 .

Mustak , M. , Salminen , J. , Plé , L. and Wirtz , J. ( 2020 ), “ Artificial intelligence in marketing: topic modeling, scientometric analysis, and research agenda ”, Journal of Business Research , Vol. 124 , pp. 389 - 404 , ( online ), doi: 10.1016/j.jbusres.2020.10.044 .

Neuman , W.L. ( 2013 ), Workbook for Social Research Methods: qualitative and Quantitative Approaches , 7th ed. , Allyn And Bacon, Pearson Higher Education , Boston .

Nicholls , J. , Kuppa , A. and Le-Khac , N.A. ( 2021 ), “ Financial cybercrime: a comprehensive survey of deep learning approaches to tackle the evolving financial crime landscape ”, IEEE Access , Vol. 9 , pp. 163965 - 163986 , ( online ), doi: 10.1109/access.2021.3134076 .

Pazarskis , M. , Koutoupis , A.G. , Kyriakou , M.I. and Galanis , S. ( 2021 ), “ Corporate governance and internal audit at Greek municipal enterprises in the COVID-19 era ”, in Proceedings of the Corporate Governance: An Interdisciplinary Outlook in the Wake of Pandemic Conference, 19-20 November 2020 , Virtus interpress,Sumy , Ukraine , ( online ), pp. 142 - 146 , doi: 10.22495/cgsetpt21 .

Pedro , J. ( 2022 ), “ [review of understanding topic coherence measures] from towards data science ”, ( online ), available at: https://towardsdatascience.com/understanding-topic-coherence-measures-4aa41339634c ( accessed 19 August 2022 ).

Qawqzeh , H.K. , Endut , W.A. and Rashid , N. ( 2021 ), “ Board components and quality of financial reporting: mediating effect of audit quality ”, Journal of Contemporary Issues in Business and Government , Vol. 27 No. 2 , p. 179 , ( online ), doi: 10.47750/cibg.2021.27.02.023 .

Rezaee , Z. ( 2005 ), “ Causes, consequences, and deterence of financial statement fraud ”, Critical Perspectives on Accounting , Vol. 16 No. 3 , pp. 277 - 298 , doi: 10.1016/s1045-2354(03)00072-8 .

Saba Ebrahim, Poshtan , J. , Jamali , S.M. and Ebrahim , N.A. ( 2020 ), “ Quantitative and qualitative analysis of time-series classification using deep learning ”, IEEE Access , Vol. 8 , pp. 90202 - 90215 .

Sharma , A. , Rana , N.P. and Nunkoo , R. ( 2021 ), “ Fifty years of information management research: a conceptual structure analysis using structural topic modeling ”, International Journal of Information Management , Vol. 58 , p. 102316 , ( online ), doi: 10.1016/j.ijinfomgt.2021.102316 .

Skousen , C.J. , Smith , K.R. and Wright , C.J. ( 2009 ), “ Detecting and predicting financial statement fraud: the effectiveness of the fraud triangle and SAS no. 99 ”, in Corporate Governance and Firm Performance , Emerald Group Publishing Limited , Bingley, USA , Vol. 13 , pp. 53 - 81 .

Soomro , Z.A. , Shah , M.H. and Thatcher , J. ( 2021 ), “ A framework for ID fraud prevention policies in the E-tailing sector ”, Computers and Security , Vol. 109 , p. 102403 , ( online ), doi: 10.1016/j.cose.2021.102403 .

Tamimi , O. ( 2021 ), “ The role of internal audit in risk management from the perspective of risk managers in the banking sector ”, Australasian Business, Accounting and Finance Journal , Vol. 15 No. 2 , pp. 114 - 129 , ( online ), doi: 10.14453/aabfj.v15i2.8 .

The Economist ( 2014 ), “ The dozy watchdogs ”, Retrieved from www.economist.com/briefing/2014/12/11/the-dozy-watchdogs

Xu , Z. , Wang , X. , Wang , X. and Skare , M. ( 2021 ), “ A comprehensive bibliometric analysis of entrepreneurship and crisis literature published from 1984 to 2020 ”, Journal of Business Research , Vol. 135 , pp. 304 - 318 , ( online ), doi: 10.1016/j.jbusres.2021.06.051 .

Yadav , K. ( 2022 ), “ The complete practical guide to topic modelling ”, (online) Medium. (online) , available at: https://towardsdatascience.com/topic-modelling-f51e5ebfb40a

Further reading

Akoglu , L. , Tong , H. and Koutra , D. ( 2015 ), “ Graph-based anomaly detection and description: a survey ”, Data Mining and Knowledge Discovery , Vol. 29 No. 3 , pp. 626 - 688 , ( online ), doi: 10.1007/s10618-014-0365-y .

Bao , Y. , Ke , B. , Li , B. , Yu , Y.J. and Zhang , J. ( 2020 ), “ Detecting accounting fraud in publicly traded US firms using a machine learning approach ”, Journal of Accounting Research , Vol. 58 No. 1 , pp. 199 - 235 , ( online ), doi: 10.1111/1475-679X.12292 .

Chowdhury , M. , Rahman , A. and Islam , R. ( 2017 ), “ Malware analysis and detection using data mining and machine learning classification ”, International conference on applications and techniques in cyber security and intelligence , ( online ), pp. 266 - 274 doi: 10.1007/978-3-319-67071-3_33 .

Dorminey , J. , Fleming , A.S. , Kranacher , M.J. and Riley , R.A. Jr. ( 2012 ), “ The evolution of fraud theory ”, Issues in Accounting Education , Vol. 27 No. 2 , pp. 555 - 579 , ( online ), doi: 10.2308/iace-50131 .

Ebrahim , S.A. , Poshtan , J. , Jamali , S.M. and Ebrahim , N.A. ( 2020 ), “ Quantitative and qualitative analysis of time-series classification using deep learning ”, IEEE Access , Vol. 8 , pp. 90202 - 90215 , ( online ), doi: 10.1109/access.2020.2993538 .

Eilifsen , A. and Messier , W.F. Jr ( 2015 ), “ Materiality guidance of the major public accounting firms ”, Auditing: A Journal of Practice and Theory , Vol. 34 No. 2 , pp. 3 - 26 , ( online ), doi: 10.2308/ajpt-50882 .

Gezer , A. , Warner , G. , Wilson , C. and Shrestha , P. ( 2019 ), “ A flow-based approach for Trickbot banking Trojan detection ”, Computers and Security , Vol. 84 , pp. 179 - 192 , ( online ), doi: 10.1016/j.cose.2019.03.013 .

Goodall , J.R. , Ragan , E.D. , Steed , C.A. , Reed , J.W. , Richardson , G.D. , Huffer , K.M. , Bridges , R.A. and Laska , J.A. ( 2018 ), “ Situ: identifying and explaining suspicious behavior in networks ”, IEEE Transactions on Visualization and Computer Graphics , Vol. 25 No. 1 , pp. 204 - 214 , ( online ), doi: 10.1109/TVCG.2018.2865029 .

Habeeb , R.A.A. , Nasaruddin , F. , Gani , A. , Hashem , I.A.T. , Ahmed , E. and Imran , M. ( 2019 ), “ Real-time big data processing for anomaly detection: a survey ”, International Journal of Information Management , Vol. 45 , pp. 289 - 307 .

Jeppesen , K.K. ( 2019 ), “ The role of auditing in the fight against corruption ”, The British Accounting Review , Vol. 51 No. 5 , p. 100798 , ( online ), doi: 10.1016/j.bar.2018.06.001 .

Leangarun , T. , Tangamchit , P. and Thajchayapong , S. ( 2018 ), “ Stock price manipulation detection using generative adversarial networks ”, 2018 IEEE Symposium Series on Computational Intelligence (SSCI) , IEEE , ( online ), pp. 2104 - 2111 , doi: 10.1109/SSCI.2018.8628777 .

Li , T. , Kou , G. , Peng , Y. and Philip , S.Y. ( 2021 ), “ An integrated cluster detection, optimization, and interpretation approach for financial data ”, IEEE Transactions on Cybernetics , Vol. 52 No. 12 , pp. 13848 - 13861 , ( online ), doi: 10.1109/TCYB.2021.3109066 .

Madahali , L. and Hall , M. ( 2020 ), “ Application of the Benford’s law to social bots and information operations activities ”, 2020 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA) , IEEE , ( online ), pp. 1 - 8 , doi: 10.1109/CyberSA49311.2020.9139709 .

Monamo , P. , Marivate , V. and Twala , B. ( 2016 ), “ Unsupervised learning for robust bitcoin fraud detection ”, 2016 Information Security for South Africa (ISSA) , IEEE , ( online ), pp. 129 - 134 , doi: 10.1109/ISSA.2016.7802939 .

Perols , J. ( 2011 ), “ Financial statement fraud detection: an analysis of statistical and machine learning algorithms ”, AUDITING: A Journal of Practice and Theory , Vol. 30 No. 2 , pp. 19 - 50 , ( online ), doi: 10.2308/ajpt-50009 .

Rosner , R.L. ( 2003 ), “ Earnings manipulation in failing firms ”, Contemporary Accounting Research , Vol. 20 No. 2 , pp. 361 - 408 .

The dozy ( 2014 ), “ Watchdogs ”, The Economist, (online) , available at: www.economist.com/briefing/2014/12/11/the-dozy-watchdogs

Trompeter , G. and Wright , A. ( 2010 ), “ The world has changed—have analytical procedure practices? ”, Contemporary Accounting Research , Vol. 27 No. 2 , pp. 669 - 700 , ( online ), doi: 10.1111/j.1911-3846.2010.01023_8.x .

Uuganbayar , G. , Yautsiukhin , A. , Martinelli , F. and Massacci , F. ( 2021 ), “ Optimization of cyber insurance coverage with the selection of cost-effective security controls ”, Computers and Security , Vol. 101 , p. 102121 , ( online ), doi: 10.1016/j.cose.2020.102121 .

West , J. and Bhattacharya , M. ( 2016 ), “ Intelligent financial fraud detection: a comprehensive review ”, Computers and Security , Vol. 57 , pp. 47 - 66 , ( online ), doi: 10.1016/j.cose.2015.09.005 .

Zhou , C. and Paffenroth , R.C. ( 2017 ), “ Anomaly detection with robust deep autoencoders ”, Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , ( online ), doi: 10.1145/3097983.3098052 .

Corresponding author

Related articles, all feedback is valuable.

Please share your general feedback

Report an issue or find answers to frequently asked questions

Contact Customer Support

Pardon Our Interruption

As you were browsing something about your browser made us think you were a bot. There are a few reasons this might happen:

To regain access, please make sure that cookies and JavaScript are enabled before reloading the page.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Financial fraud detection through the application of machine learning techniques: a literature review

Humanities and Social Sciences Communications volume  11 , Article number:  1130 ( 2024 ) Cite this article

Metrics details

Financial fraud negatively impacts organizational administrative processes, particularly affecting owners and/or investors seeking to maximize their profits. Addressing this issue, this study presents a literature review on financial fraud detection through machine learning techniques. The PRISMA and Kitchenham methods were applied, and 104 articles published between 2012 and 2023 were examined. These articles were selected based on predefined inclusion and exclusion criteria and were obtained from databases such as Scopus, IEEE Xplore, Taylor & Francis, SAGE, and ScienceDirect. These selected articles, along with the contributions of authors, sources, countries, trends, and datasets used in the experiments, were used to detect financial fraud and its existing types. Machine learning models and metrics were used to assess performance. The analysis indicated a trend toward using real datasets. Notably, credit card fraud detection models are the most widely used for detecting credit card loan fraud. The information obtained by different authors was acquired from the stock exchanges of China, Canada, the United States, Taiwan, and Tehran, among other countries. Furthermore, the usage of synthetic data has been low (less than 7% of the employed datasets). Among the leading contributors to the studies, China, India, Saudi Arabia, and Canada remain prominent, whereas Latin American countries have few related publications.

Similar content being viewed by others

literature review of financial analysis statement

Feature generation and contribution comparison for electronic fraud detection

literature review of financial analysis statement

A synthetic data set to benchmark anti-money laundering methods

literature review of financial analysis statement

A numeric-based machine learning design for detecting organized retail fraud in digital marketplaces

Introduction.

Financial fraud represents a highly significant problem, resulting in grave consequences across business sectors and impacting people’s daily lives (Singh et al., 2022 ). Its occurrence leads to reduced confidence in the economy, resulting in destabilization and direct economic repercussions for stakeholders (Reurink, 2018 ). Abdallah et al. ( 2016 ) define fraud as a criminal act aimed at obtaining money unlawfully. There are diverse types of fraud, such as asset misappropriation, expense reimbursement, and financial statement manipulation. Scholars have classified fraud into three categories: banking, corporate, and insurance (Ali et al., 2022 ; Nicholls et al., 2021 ; West and Bhattacharya, 2016 ).

The problem becomes evident in the case of financial fraud, evidenced by the 2022 figures of the PricewaterhouseCoopers survey report revealing that 56% of companies globally have fallen victim to some form of fraud. In Latin America, 32% of companies have experienced fraud (PricewaterhouseCoopers, 2022 ). These alarming statistics align with the findings from Klynveld Peat Marwick Goerdeler (KPMG), indicating that 83% of the surveyed executives reported being targeted by cyber-attacks in the past 12 months. Furthermore, 71% had encountered some type of internal or external fraud (KPMG, 2022 ). These survey results reveal the higher risks of financial fraud faced by companies in Latin America, the United States, and Canada. In this context, traditional approaches, and techniques, as well as manual methods, have lost relevance and effectiveness because they cannot effectively address the complexity and scale of the information involved in detecting financial fraud.

As previously mentioned, despite the interest of organizations in detecting financial fraud using machine learning (ML), current knowledge in this field remains limited. After an initial research phase, specialized literature shows that most researchers have directed their efforts toward the analysis of credit card fraud using a supervised approach (Femila Roseline et al., 2022 ; Madhurya et al., 2022 ; Plakandaras et al., 2022 ; Saragih et al., 2019 ). In the studies of Ali et al. ( 2022 ), Hilal et al. ( 2022 ), and Ramírez-Alpízar et al. ( 2020 ), ML techniques employing the supervised approach were found to be the most widely used method for detecting financial fraud, compared to the unsupervised, deep learning, reinforcement, and semi-supervised approaches, among others. Moreover, scholars such as Whiting et al. ( 2012 ) have compared the performance of data mining models for detecting fraudulent financial statements using data from quarterly and annual financial indexes of public companies from the COMPUSTAT database.

Reurink ( 2018 ) has analyzed financial fraud resulting from false financial reports, scams, and misleading financial sales in the context of the financial market. Just like Wadhwa et al. ( 2020 ), he presented a wide variety of data mining methods, approaches, and techniques used in fraud detection, in addition to research addressing online banking fraud (Zhou et al., 2018 ; Moreira et al., 2022 ; Srokosz et al., 2023 ) and financial statement fraud (S. Chen, 2016 ; Ramírez-Alpízar et al., 2020 ). The abovementioned research works show that the accuracy of ML techniques in developing models for detecting financial fraud has increased (Al-Hashedi and Magalingam, 2021 ).

The effectiveness of financial fraud detection and prevention depends on the effective selection of appropriate ML techniques to identify new threats and minimize false fraud alarm warnings, responding to the negative impact of financial fraud on organizations (Ahmed et al., 2016 ). The use of ML techniques has made it possible to identify patterns and anomalies in large financial data sets. However, developments in detection tools, inaccurate classification, detection methods, privacy, computer performance, and disproportionate misclassification costs continue to hinder the accurate and timely detection of financial fraud (Dantas et al., 2022 ; Mongwe and Malan, 2020 ; Nicholls et al., 2021 ; West and Bhattacharya, 2016 ).

Recently, several studies have reviewed financial statement fraud detection methods in data mining and ML (Gupta and Mehta, 2021 ; Shahana et al., 2023 ); however, the present study is different from these past works in the area. These authors established the types of financial fraud and the different data mining techniques and approaches used to detect financial statement fraud. In contrast, our study explains the trends in the use of ML approaches and techniques to detect financial fraud, and it presents the more frequently used datasets in the literature for conducting experiments.

Fraud detection mechanisms using machine learning techniques help detect unusual transactions and prevent cybercrime (Polak et al., 2020 ). Although each of these approaches uses different methods in their experimentation, a systematic literature review (SLR) shows that the application of each algorithm mirrors performance metrics to determine the accuracy with which it predicts that a financial transaction is fraud. Such metrics include Accuracy, Precision, F1 Score, Recall, and Sensitivity, among others.

The research presented uses a rigorous and well-structured methodology to expand current knowledge on financial fraud detection using machine learning (ML) techniques. Through the use of a systematic literature review that follows adaptations of PRISMA guidelines and Kitchenham’s methodology, the study ensures a carefully planned and transparent review process. The sources of information consulted include research articles published in reputable academic databases such as Scopus, IEEE Xplore, Taylor & Francis, SAGE, and ScienceDirect, ensuring that the review covers the most relevant and quality scientific literature in the field of financial fraud and machine learning. Moreover, the study includes a bibliometric analysis using VOSviewer software, which allows identifying trends and patterns within the literature both quantitatively and visually. Based on the 104 articles reviewed, which cover the period 2012–2023, we manage to describe the types of fraud, the models applied, the ML techniques used, the datasets employed, and the metrics of performance reported. These contribute to filling the existing gaps in the literature by providing a comprehensive and up-to-date synthesis of the evidence on the use of machine learning techniques for financial fraud detection, thus laying the groundwork for future research and practical applications in this field.

Our responses to the initial research questions raised are four main contributions that justify this research. Thus, this study contributes to the literature on financial fraud detection by examining the relationship between the current literature on financial fraud detection and ML based on the scholars, articles, countries, journals, and trends in the area. Fraud has been classified as internal and external, with a focus on credit card loan fraud investigations and insurance fraud. The different ML techniques and their models applied to experiments were grouped. The most widely used datasets in financial fraud detection using ML are analyzed according to the 86 articles that contained experiments, highlighting that most of them involve real data. This paper is useful for researchers because it studies and presents the metrics used in supervised and unsupervised learning experiments, providing a clear view of their application in the different models.

Therefore, this study is relevant because it presents in a consolidated and updated manner new contributions derived from experiment results regarding the use of ML, which helps address the problem when financial fraud occurs.

The research work is organized as follows: the section “Methods” comprehensively describes the research method and the questions addressed in the study. Section “Results of the data synthesis” presents the findings encompassing authors, articles, sources, countries, trends, financial fraud types, and datasets with their characteristics to which the detection models using ML techniques were applied, with the results of their metrics. Finally, the section “Discussion and conclusion” highlights the conclusions, including future lines of research in the field.

The study focuses on SLR, which provides a comprehensive view of the great developments in financial fraud detection. Considering the purpose, scientific guidelines were followed in the literature review of the PRISMA and Kitchenham methods, which were adapted by the authors (Ashtiani and Raahemi, 2022 ; Kitchenham and Brereton, 2013 ; Kitchenham and Stuart, 2007 ; Kumbure et al., 2022 ; Moher et al., 2009 ; Roehrs et al., 2017 ; Saputra et al., 2023 ; Wohlin, 2014 ).

The method used in the SLR was developed with carefully planned and executed activities: (a) planning of the review, (b) definition of research questions, (c) description of the search strategy, (d) consultation concerning the search strategy, (e) selection of the inclusion/exclusion criteria and data selection, (f) description of the quality assessment, (g) investigation of the study topics, (h) description of data extraction, and (i) synthesis of the data.

Each of the activities conducted in this study is explained below.

Planning of the review

The research purpose was established in accordance with the indicated research goals and questions. The analysis focused on research articles published between 2012 and 2023, particularly those using ML methods for financial fraud detection. Accordingly, the SLR procedure presented by Kitchenham and Stuart ( 2007 ) and Moher et al. ( 2009 ) was implemented following a series of steps adapted and modified by Ashtiani and Raahemi ( 2022 ) and Kumbure et al. ( 2022 ), as depicted in Fig. 1 . Thus, it was possible to ensure a rigorous and objective analysis of the available literature in our field of interest.

figure 1

Description of the general process used to review the literature in the study area. Authors’ own elaboration.

The procedures implemented in this review process are discussed in the following subsections.

Definition of research questions

In SLR, research questions are key and decisive for the success of the study (Kitchenham and Stuart, 2007 ). Therefore, analyzing the existing literature on financial fraud detection through ML techniques and its characteristics, problems, challenges, solutions, and research trends is crucial. Table 1 describes the research questions to provide a structured framework for the study.

Within the proposed systematic review, the questions were fine-tuned, achieving a better classification and thematic analysis. The research questions were categorized into two groups: general questions (GQ) and specific questions (SQ). GQs provide an overview of the current state of the art, that is, a general framework for future research. Meanwhile, SQs focus on specific matters emerging from the application areas of the topic, thereby improving the filtering process of the study.

Description of the search strategy

The search strategy was designed to identify a set of studies addressing the research questions posed. This strategy was to be implemented in two stages. In the first stage, a manual search was conducted by selecting a set of test documents through a defined database. Following the strategy proposed by Wohlin ( 2014 ), a snowball search was conducted. This approach involved choosing from a set of initial references (e.g., relevant articles or books addressing the subject matter) and searching for new related references relevant to the study based on these.

In the second stage, an automated search was performed using the technique described by Kitchenham and Brereton ( 2013 ), which included preparing a list of the main search terms to be applied in the queries in each database, as indicated in subsection “Search queries”.

Manual search

In the study’s initial stage, nine journal articles were selected from the test set of papers (Ahmed et al., 2016 ; Ali et al., 2022 ; Bakumenko and Elragal, 2022 ; Gupta and Mehta, 2021 ; Hilal et al., 2022 ; Nicholls et al., 2021 ; Nonnenmacher and Marx Gómez, 2021 ; Ramírez-Alpízar et al., 2020 ; West and Bhattacharya, 2016 ). The manual literature search helped identify articles related to financial fraud detection through ML techniques, which were used as an initial set and were part of the final analysis. In the subsequent stage, a backward and forward snowball search was conducted. This approach involved using the initial set to select the relevant articles.

The backward snowball search process comprised reviewing article titles, including those meeting the inclusion and exclusion criteria. In the forward snowball search, the analysis was performed in the Scopus database to identify studies citing one or more of the articles in the initial set. This filtering method helped identify studies meeting the inclusion and exclusion criteria, eliminate duplicates from the previous set, and analyze articles answering the questions posed, which were retained in the final study set.

Automated search

The research work mainly aimed to obtain a reliable set of relevant studies to minimize bias and increase the validity of the results. To this end, a manual search for articles meeting the inclusion and exclusion criteria was conducted by assessing the abstracts and other sections of articles. We decided to implement an automated search strategy using five databases: Scopus, IEEE Xplore, Taylor & Francis, SAGE, and ScienceDirect, known for their impartiality in the representation of research works, with inclusion and exclusion criteria already defined, thereby complementing the search. Thus, 104 related articles meeting the criteria established in the final set were identified.

Search queries

Studies from 2012 onward were reviewed with keywords such as “financial fraud” and “machine learning” to identify model-based approaches and associated techniques. Table 2 presents a summary of the queries used in each data source.

Inclusion and exclusion criteria and study selection

The study established inclusion and exclusion criteria, a key process to select the most relevant articles. The exclusion criteria were documents published between 2012 and 2023 (until March), such as conference reviews, book chapters, editorials, and reviews. Further, the availability of the full text of the article was considered. We decided to exclude articles published before 2012 for the following reasons: (i) They were over 11 years old; (ii) Relevant publications prior to 2012 were scarce; and (iii) Sufficient number of articles were available between 2012 and 2023.

For the inclusion and exclusion criteria, appropriate filtering tools were applied to each data source during the search stage. This enabled the automated selection of the most relevant and appropriate studies based on the research goal.

Data processing strategies

In the data processing strategy used, databases were selected following strict inclusion and exclusion criteria to ensure the quality and relevance of the information collected (Table 3 ). Various databases initially identified the following number of relevant articles: Scopus (28), Taylor & Francis (80), SAGE (71), ScienceDirect (663), and IEEE Xplore (5132). This initial step provides a broad overview of the available literature in the field of financial fraud detection using ML models.

Subsequently, a data removal phase was carried out so as to ensure data integrity, such that the following number of articles (given in parentheses) were removed from each database: Scopus (0), Taylor & Francis (63), SAGE (57), ScienceDirect (636), and IEEE Xplore (5114). This rigorous process ensures the integrity of the data collected and avoids redundancy.

The final step consisted of obtaining the consolidated number of articles included after the selection and exclusion of duplicates: Scopus (28), Taylor & Francis (17), SAGE (14), ScienceDirect (27), and IEEE Xplore (18). This methodological strategy ensured the relevance of the articles that carried out a complete analysis in the field of financial fraud detection using ML models.

Quality assessment

Once the inclusion and exclusion criteria were applied, the remaining articles were assessed for quality. The evaluation criteria used included the purpose of the research; contextualization; literature review; and related works, methods, conclusions, and results. To minimize the empirical obstacles associated with full-text filtering, a set of questions proposed by Roehrs et al. ( 2017 ) (see Table 4 ) was used to validate whether the selected articles met the previously established quality criteria.

Research topics

In conducting the literature review to understand the current state of published research on the topic, a data orientation process was addressed, including preprocessing techniques and ML models and their metrics. Accordingly, four research topics were defined based on the research goals. They are presented in Table 5 .

Data extraction

For data extraction, the necessary attributes were first defined and the information pertaining to the study goals was summarized. Next, the relevant information was identified and obtained through a detailed reading of the full text of each article. The information was then stored in a Microsoft Excel spreadsheet. Data were collected on the attributes specified in Table 6 . In Table 6 , the “Study” column corresponds to the identifiers of the research topics in Quality Assessment, and the “Subject” column refers to the category to which the different attributes belong. The names of the attributes and a brief description are presented in the last two columns of the table, including additional columns with relevant information.

Data synthesis

Data synthesis included analyzing and summarizing the information observed in the selected articles to address the research questions. To perform this task, a synthesis was conducted following the guidelines proposed by Moher et al. ( 2009 ) based on qualitative data. Further, a descriptive analysis was performed to obtain answers to the research questions. Consequently, a qualitative approach to data evidence was followed.

Results of the data synthesis

In this section, the 104 finally selected articles have been considered. The data were synthesized to address the five research questions mentioned.

General questions (GQ)

GQ1: Which were the most relevant authors, articles, sources, countries, and trends in the literature review on financial fraud detection based on the application of machine learning (ML) models?

The literature on financial fraud detection applying ML models has been studied by a large number of authors. However, some authors stood out in terms of the number of published papers and number of citations. Specifically, the most significant authors with two publications are Ahmed M. (with 318 citations), Ileberi E. (82 citations), Ali A. (20 citations), Chen S. (84 citations), and Domashova J and Kripak E. (each with 6 citations). Other relevant authors with one publication and who have been cited several times are Abdallah A. (with 333 citations), Abbasimehr H. (18 citations), Abd Razak S. (13 citations), Achakzai M. A. K. (5 citations), and Abosaq H. (2 citations). The aforementioned authors have contributed significantly to the development of research in financial fraud detection using ML models (Fig. 2 ).

figure 2

Shows the analysis of the connections between authors based on co-authorship of publications. Produced with VOSviewer.

Collectively, the researchers have contributed a solid knowledge base and have laid the foundation for future research in financial fraud detection using ML models. Although other researchers contributed to the field, such as Khan, S. and Mishra, B., both with 7 citations, among others, some have been more prominent in terms of the number of papers published. Their collective works have enriched the field and have promoted a greater understanding of the challenges and opportunities in this area.

As depicted in Fig. 3 , clusters 2 (green) and 4 (yellow) present the most relevant research articles on financial fraud detection using ML models. Cluster 2, comprising 9 articles with 357 citations and 32 links, is highlighted because of the significant impact of the articles by Sahin, Huang, and Kim. These articles have the highest number of citations and are deemed to be useful starting points for those intending to dive into this research field. Cluster 4, constituting 6 articles with 158 citations and 27 links, includes the works of Dutta and Kim, who have also been cited considerably.

figure 3

Depicts the connections between articles based on their bibliographic references. Produced with VOSviewer.

Articles in clusters 1 (red) and 3 (dark blue) could be valuable sources of information; however, they were observed to have a lower number of citations and links than those in clusters 2 and 4, such as that of Nian K. (62 citations and 4 links) and Olszewski (92 citations and 4 links). However, some articles in these clusters have had a substantial number of citations.

In Cluster 10 (pink), the article by Reurink A. is prominent, with 38 citations. This is followed by the article by Ashtiani M.N. with 10 citations. In Cluster 11 (light green), the article by Hájek P. has 129 citations. In Cluster 12 (grayish blue), the articles by Blaszczynski J. and Elshaar S. have the greatest number of citations, indicating their influence in the field of financial fraud detection.

In Cluster 13 (light brown), the article by Pourhabibi T. has the greatest number of citations at 102, suggesting that he has been relevant in the research on financial fraud detection. Finally, in Cluster 14 (purple), the articles by Seera M. have 63 citations and 2 links. The article by Ileberi E. has 11 citations and 1 link. Both articles have a small number of citations, indicating a lower influence on the topic.

In conclusion, clusters 2, 4, and 11 are the most relevant in this literature review. The articles by Sahin, Huang, Kim, Dutta, and Pumsirirat are the most influential ones in the research on financial fraud detection through the application of ML models.

The information presented in Fig. 4 is the result of a clustering analysis of the articles resulting from the literature review on financial fraud detection by implementing ML models. In total, 48 items were identified and grouped into 12 clusters. The links between the items were 100, with a total link strength of 123.

figure 4

Shows the relationship between different scientific journals based on bibliographic links. Produced with VOSviewer.

The following is a description of each cluster with its respective number of items, links, and total link strength (the number of times a link appears between two items and its strength):

Cluster 1 (6 articles—red): This cluster includes journals such as Computers and Security , Journal of Network and Computer Applications , and Journal of Advances in Information Technology . The total number of links is 27, and the total link strength is 32.

Cluster 2 (6 articles—dark green): This cluster includes articles from Technological Forecasting and Social Change , Journal of Open Innovation: Technology, Market, and Complexity , and Global Business Review . The total number of links is 18, and the total link strength is 19.

Cluster 3 (5 articles—dark blue): This cluster includes articles from the International Journal of Advanced Computer Science and Applications , Decision Support Systems , and Sustainability . The total number of links is 19, and the total link strength is 20.

Cluster 4 (4 articles—dark yellow): This cluster includes articles from Expert Systems with Applications and Applied Artificial Intelligence . The total number of links is 26, and the total link strength is 45.

Cluster 5 (4 articles—purple): This cluster includes articles from Future Generation Computer Systems and the International Journal of Accounting Information Systems . The total number of links is 15, and the total link strength is 16.

Cluster 6 (4 articles—dark blue): This cluster includes articles from IEEE Access and Applied Intelligence . The total number of links is 18, and the total link strength is 26.

Cluster 7 (4 articles—orange): This cluster includes articles from Knowledge-Based Systems and Mathematics . The total number of links is 23, and the total link strength is 29.

Cluster 8 (4 articles—brown): This cluster includes articles from the Journal of King Saud University—Computer and Information Sciences and the Journal of Finance and Data Science . The total number of links is 13, and the total link strength is 13.

Cluster 9 (4 articles—light purple): This cluster includes articles from the International Journal of Digital Accounting Research and Information Processing and Management . The total number of links is 2, and the total link strength is 2.

The clusters represent groups of related articles published in different academic journals. Each cluster has a specific number of articles, links, and total link strength. These findings provide an overview of the distribution and connectedness of articles in the literature on financial fraud detection using ML models. Further, clustering helps identify patterns and common thematic areas in the research, which may be useful for future researchers seeking to explore this field.

Clusters 1, 4, and 7 indicate a greater number of stronger articles and links. These clusters encompass articles from Computers and Security , Expert Systems with Applications , and Knowledge-Based Systems , which are important sources for the SLR on financial fraud detection through the implementation of ML models.

The analysis presented indicates the number of documents related to research in different countries and territories. In this case, a list of 50 countries/territories and the number of documents related to the research conducted in each of them is presented. China leads with the highest paper count at 18, followed by India at 13 and Saudi Arabia and Canada at 9 each. Canada, Malaysia, Pakistan, South Africa, the United Kingdom, France, Germany, and Russia have similar research outputs with 4–9 papers. Sweden and Romania have 1 or 2 research papers, indicating limited scientific research output.

The presence of little-known countries such as Armenia, Costa Rica, and Slovenia suggests ongoing research in places less common in the academic world. From that point on, the number of papers has gradually decreased.

The production of papers is geographically distributed across countries from different continents and regions. However, more research exists on the subject from countries with developed and transition economies, which allows for a greater capacity to conduct research and produce papers.

Figure 5 , sourced from Scopus’s “Analyze search results” option, depicts countries with their respective number of published papers on the topic of financial fraud detection through ML models.

figure 5

Represents the number of scientific publications in the study area classified by country. Produced with VOSviewer.

The above shows the diversity of countries involved in the research, where China leads the number of studies with 18 papers, followed by India with 13 and Saudi Arabia and Canada each with 9 papers. The other countries show little production, with less than 7 publications, which indicates an emerging topic of interest for the survival of companies that must prevent and detect different financial frauds using ML techniques.

The most relevant keywords in the review of literature on financial fraud detection implementing ML models include the following:

In Cluster 1, the most relevant keywords are “decision trees” (13 repetitions), “support vector machine (SVM)” (11 repetitions), “machine-learning” (10 repetitions), and “credit card fraud detection” (9 repetitions). A special focus has been placed on the topic of artificial intelligence (ML), in addition to algorithms and/or supervised learning models such as decision trees, support vector machines, and credit card fraud detection.

In Cluster 2, the most relevant keywords are “crime” (46 repetitions), “fraud detection” (43 repetitions), and “learning systems” (13 repetitions). These terms reflect a broader focus on financial fraud detection, where the aspects of crime in general, fraud detection, and learning systems used for this purpose have been addressed.

In Cluster 3, the most relevant keywords are “Finance” (19 repetitions), “Data Mining” (18 repetitions), and “Financial Fraud” (12 repetitions). These keywords indicate a focus on the financial industry, where data mining is used to reveal patterns and trends related to financial fraud.

In Cluster 4, the most relevant keywords are “Machine Learning” (45 repetitions), “Anomaly Detection” (16 repetitions), and “Deep Learning” (11 repetitions). They reflect an emphasis on the use of traditional ML and deep learning techniques for anomaly detection and financial fraud detection.

In general, the different clusters indicate the most relevant keywords in the SLR on financial fraud detection through ML models. Each cluster presents a specific set of keywords reflecting the most relevant trends and approaches in this field of research (Fig. 6 ).

figure 6

Shows the relationships between keywords based on their co-occurrence in the literature reviewed. Produced with VOSviewer.

GQ2: What types of financial fraud have been identified in ML studies?

Financial fraud is generated by weaknesses in companies’ control mechanisms, which are analyzed based on the variables that allow them to materialize. These include opportunity, motivation, self-fulfillment, capacity, and pressure. Some of these are comprehensively analyzed by Donald Cressey through the fraud theory approach. The lack of modern controls has led organizations to use ML in response to this major problem. According to the findings of the Global Economic Crime and Fraud Survey 2022–2023, which gathered insights from 1,028 respondents across 36 countries worldwide, instances of fraud within these companies have caused a financial loss of approximately 10 million dollars (PricewaterhouseCoopers, 2022 ).

Referring to the concept of fraud, as outlined in international studies (Estupiñán Gaitán, 2015 ; Márquez Arcila, 2019 ; Montes Salazar, 2019 ) and the guidelines of the American Institute of Certified Public Accountants, it is an illegal, intentional act in which there is a victim (someone who loses a financial resource) and a victimizer (someone who obtains a financial resource from the victim). Thus, the proposed classification includes corporate fraud and/or fraud in organizations, considering that the purpose is to misappropriate the capital resources of an entity or individual: cash, bank accounts, loans, bonds, stocks, real estate, and precious metals, among others.

In this SLR study, we have considered fraud classifications by authors of 86 articles, which encompass experiments. We have excluded the 18 SLR articles from our analysis. The types presented in Table 7 follow the holistic view of the authors of the research for a better understanding of the subject of financial fraud, considering whether it is internal or external fraud.

Table 7 highlights the diverse types of frauds, and the research works on them. According to the classification, external frauds correspond to those performed by stakeholders outside the company. This study’s findings show that 54% of the analyzed articles investigate external fraud, among which the most important studies are on credit card loan fraud, followed by insurance fraud, using supervised and unsupervised ML techniques for their detection.

In research works (Kumar et al., 2022 ) analyzing credit card fraud, attention is drawn to the importance of prevention through the behavioral analysis of customers who acquire a bank loan and identifying applicants for bad loans through ML models. The datasets used in these fraud studies have covered transactions performed by credit card holders (Alarfaj et al., 2022 ; Baker et al., 2022 ; Hamza et al., 2023 ; Madhurya et al., 2022 ; Ounacer et al., 2018 ; Sahin et al., 2013 ), while other research works have covered master credit card money transactions in different countries (Wu et al., 2023 ) and fraudulent transactions gathered from 2014 to 2016 by the international auditing firm Mazars (Smith and Valverde, 2021 ).

The second major type of external fraud is insurance fraud, which is classified as fraud in health insurance programs involving practices such as document forgery, fraudulent billing, and false medical prescriptions (Sathya and Balakumar, 2022 ; Van Capelleveen et al., 2016 ) and automobile insurance fraud involving fraudulent actions between policyholders and repair shops, who mutually rely on each other to obtain benefits (Aslam et al., 2022 ; Nian et al., 2016 ; Subudhi and Panigrahi, 2020 ); as a result of the issues they face, insurance companies have developed robust models using ML.

As regards internal fraud, caused by an individual within the company, 46% of studies have analyzed this type, with financial statement fraud, money laundering fraud, and tax fraud standing out. The studies show that the investigations are based on information reported by the US Securities and Exchange Commission (SEC) and the stock exchanges of China, Canada, Tehran, and Taiwan, among others. To a considerable extent, the information taken is from the real sector, and very few studies have obtained synthetic information based on the application of different learning models.

The following is a summary of the financial information obtained by the researchers to apply AI models and techniques:

Stock market financial reports : Fraud in the Canadian securities industry (Lokanan and Sharma, 2022 ), companies listed on the Chinese stock exchanges (Achakzai and Juan, 2022 ; Y. Chen and Wu, 2022 ; Xiuguo and Shengyong, 2022 ), companies with shares according to the SEC (Hajek and Henriques, 2017 ; Papík and Papíková, 2022 ), companies listed on the Tehran Stock Exchange (Kootanaee et al. 2021 ), companies in the Taiwan Economic Journal Data Bank (TEJ) stock market (S. Chen, 2016 ; S. Chen et al., 2014 ), analysis of SEC accounting and auditing publications (Whiting et al., 2012 )

Wrong financial reporting to manipulate stock prices (Chullamonthon and Tangamchit, 2023 ; Khan et al., 2022 ; Zhao and Bai, 2022 )

Financial data of 2318 companies with the highest number of financial frauds (mechanical equipment, medical biology, media, and chemical industries; Shou et al., 2023 ), fraudulent financial restatements (Dutta et al., 2017 )

Data from 950 companies in the Middle East and North Africa region (Ali et al., 2023 ), analyzing outliers in sampling risk and inefficiency of general ledger financial auditing (Bakumenko and Elragal, 2022 ), fraudulent intent errors by top management of public companies (Y. J. Kim et al., 2016 ), reporting of general ledger journal entries from an enterprise resource planning system (Zupan et al., 2020 )

Synthetic financial dataset for fraud detection (Alwadain et al., 2023 ).

Studies have analyzed situations involving fraudulent financial statements. In these cases, instances of fraud have already occurred, leading to the creation of financial reports that contain statements with outliers that can be deemed fraudulent intent or errors in financial figures. This raises a reasonable doubt about whether an intent exists with regard to the reporting of unrealistic figures. Notably, once there are parties responsible for the financial information presented to stakeholders, such as organization owners, managers, administrators, accountants, or auditors, it is unlikely for it to be unintentional (an error). In this context, transparency and explainability are essential so as to ensure fairness in decisions, thus avoiding bias and discrimination based on prejudiced data (Rakowski et al., 2021 ).

Because of its significance, the information reported in financial statements is vital for investigations. Studies have indicated substantial amounts of data extracted from the financial reports of regulatory bodies such as stock exchanges and auditing firms. These entities use the data to establish the existence of fraud and its types through predictive models that use ML techniques. Thus, they require financial data such as dates, the third party affected, user, debit or credit amount, and type of document, among other aspects involving an accounting record. This information aids in identifying the possible impact in terms of lower profits and the perpetrator and/or perpetrators to gather sufficient evidence and file criminal proceedings for the financial damage caused.

Moreover, investigations concerning money laundering fraud and/or money laundering, the second most investigated internal fraud type, encompass the reports of natural and legal persons exposed by the Financial Action Task Force in countries such as the Kingdom of Saudi Arabia (Alsuwailem et al., 2022 ), transactions from April to September 2018 from Taiwan’s “T” bank and the account watch list of the National Police Agency of the Ministry of Interior (Ti et al., 2022 ), money laundering frauds in Middle East banks (Lokanan, 2022 ), transactions of financial institutions in Mexico from January 2020 (Rocha-Salazar et al., 2021 ), and synthetic data of simulated banking transactions (Usman et al., 2023 ).

Concerns regarding the entry of proceeds from money laundering into an organization have been articulated in relation to the financial damage it causes to the country. At the macroeconomic level, these activities negatively affect financial stability, distorting the prices of goods and services. Moreover, such activities disrupt markets, making it difficult to make efficient financial decisions. At the microeconomic level, legitimate businesses face unfair competition with companies using illegal money, which may lead to higher unemployment levels. Furthermore, money laundering has a social impact because it affects the security and welfare of society.

Thus, some research works (Alsuwailem et al., 2022 ) have indicated the need to implement ML models for promoting anti-money laundering measures. For instance, in Saudi Arabia, money from illicit drug trafficking, corruption, counterfeiting, and product piracy have entered the country. The measures to be taken are categorized according to the three stages of money laundering: placement, layering (also known as concealment), and integration. These include new legal regulations against money laundering, staff training, customer identification and validation, reporting of suspicious activities, and documentation and storage of relevant data (Bolgorian et al., 2023 ).

Regarding the 7.5% incidence of internal fraud, specifically categorized as tax fraud resulting from tax evasion, the studies have analyzed tax returns on income and/or profits of legal persons and/or individuals from the Serbian tax administration during 2016–2017 (Savić et al., 2022 ). Studies have encompassed periodic value-added tax (VAT) returns, together with the anonymous list of clients for the tax year 2014 obtained from the Belgian tax administration (Vanhoeyveld et al., 2020 ) and income tax and VAT taxpayers registered and provided by the State Revenue Committee of the Republic of Armenia in 2018 (Baghdasaryan et al., 2022 ). These studies hold great relevance for tax administrations using different strategies to minimize the impact of fraud resulting from tax evasion. Tax evasion reduces the government’s ability to collect revenue, directly affecting government finances and causing budget deficits, thereby increasing public debt.

GQ3: Which ML models were implemented to detect financial fraud in the datasets?

Given that ML is a key tool to extract meaningful information and make informed decisions, this study analyzes the most widely used ML techniques in the field of financial fraud detection. It takes as reference 86 experimental articles, excluding 18 SLR articles. In these articles, the most commonly used trends and approaches in the implementation of ML techniques in financial fraud detection were identified.

For the analysis, the pattern of frequency of use of ML models was observed. Several of them have been prominent because of their popularity and implementation in detecting financial fraud (Fig. 7 ). Some of the most widely used models include long-short term memory (LSTM) with 7 mentions, autoencoder with 10 mentions, XGBoost with 13 mentions, k -nearest neighbors (KNN) with 14 mentions, artificial neural network (ANN) with 17 mentions, NB with 19 mentions, SVM with 29 mentions, DT with 29 mentions, LR with 32 mentions, and RF with 34 mentions.

figure 7

Illustrates the most common machine learning models in financial fraud detection. Authors’ own elaboration.

The LSTM model is a recurrent neural network used for sequence processing, especially for tasks concerning natural language processing (Chullamonthon and Tangamchit, 2023 ; Esenogho et al., 2022 ; Femila Roseline et al., 2022 ). Moreover, autoencoders are models used for data compression and decompression. These models are useful in dimensionality reduction applications (Misra et al., 2020 ; Srokosz et al., 2023 ). XGBoost is a library combining multiple weak DT models, offering a scalable and efficient solution in classification and regression tasks (Dalal et al., 2022 ; Udeze et al., 2022 ).

KNN and ANN are widely used models in various ML applications. KNN is based on neighbor closeness, and ANN is inspired by human brain functioning. NB is a probabilistic algorithm commonly used in text classification and data mining (Ashtiani and Raahemi, 2022 ; Lei et al., 2022 ; Shahana et al., 2023 ).

SVM, DT, LR, and RF, the most commonly mentioned models, are used in a wide range of classification and regression applications. These models are prominent because of their effectiveness and applicability to different scenarios, such as credit card loan fraud (external fraud) and financial statement fraud (internal fraud).

The most frequently used ML techniques are supervised learning (56.73%); unsupervised learning (18.29%), a combination of supervised and unsupervised learning (15.38%), a combination of supervised and deep learning (2.88%), and mathematical approach, supervised, and semi-supervised learning (0.96%). Figure 8 presents the ML techniques in the literature reviewed and indicates the number of times each type of technique is applied. Some articles applied several ML methods, in which the algorithms are mainly classified according to the learning method. In this case, there are four main types: supervised, semi-supervised, unsupervised, and deep learning.

figure 8

Shows the different experimental approaches used in the study. Authors’ own elaboration.

Supervised learning is the most widely used technique, with 56.73% of citations in financial fraud studies. In this approach, labeled training data are used, where the expected outputs are known and a model is built that can make higher-accuracy predictions on new unlabeled data. Common examples of supervised learning techniques include the models of LR, SVM, DT, RF, KNM, NB, and ANN.

Moreover, unsupervised learning constitutes 18.27% of the mentions. The technique focuses on discovering patterns in the data without knowing data with labels and/or types for training. Some of these include DBSCAN, autoencoder, and isolation forest (IF).

The combination of supervised, unsupervised, and semi-supervised learning is used with a frequency of 1.92%. This technique and/or approach combines elements of supervised and unsupervised learning, using both labeled and unlabeled data to train the models. It is also used when labeled data are scarce or expensive to obtain; thus, the aim is to take advantage of unlabeled information to improve model performance.

Finally, supervised and deep learning represents 2.88% of the mentions. It is based on deep neural networks with multiple neurons and hidden layers to learn complex data representations. It has achieved remarkable developments in areas such as image processing, voice recognition, and machine translation.

Specific questions (SQ)

SQ1: What datasets were used by implementing ML models for financial fraud detection?

First, the data structure and fraud types may vary with the collection of datasets. The performance of fraud detection models may be affected by variations in the number of instances and attributes selected. Therefore, investigating the datasets and their characteristics is relevant, as data differ in terms of data type (number, text) and the data source from which they were obtained (synthetic and/or real), as can be observed in Fig. 9 .

figure 9

Depicts the datasets used in the research on financial fraud detection. Authors’ own elaboration.

Credit card fraud detection

The dataset was created by the Machine Learning group at Université Libre de Bruxelles. It encompasses anonymized credit card transactions labeled as fraudulent or genuine. The transactions were performed in September 2013 over two days by European cardholders; a record of only 492 frauds out of 284,807 transactions is highly unbalanced because the positive types (frauds) represent only 0.172% of all transactions (Machine Learning Group, 2018 ).

The characteristics of the set encompass numerical variables resulting from a principal component analysis (PCA) transformation. For confidentiality, the original features of the data have not been disclosed. Features V1, V2…, V28 have been the main components obtained through PCA. The only features that have not transformed with PCA include “Time,” which denotes the seconds elapsed between each transaction. “Amount” denotes the transaction amount. The “Class” feature is the response variable, taking 1 as the value in case of fraud and 0 (no fraud) otherwise.

This dataset has been used by 15 authors in their papers, who have applied different financial fraud detection techniques (Alarfaj et al., 2022 ; Baker et al., 2022 ; Fanai and Abbasimehr, 2023 ; Fang et al., 2019 ; Femila Roseline et al., 2022 ; Hwang and Kim, 2020 ; Ileberi et al., 2021 , 2022 ; Khan et al., 2022 ; Misra et al., 2020 ; Ounacer et al., 2022 ).

Statlog (German credit data)

The dataset was proposed by Professor Hofmann to the UC Irvine ML repository on November 16, 1994, for facilitating credit rating (Hofmann, 1994 ). It mainly aims to determine whether a person presents a favorable or unfavorable credit risk (binary rating). The set is multivariate, which implies that it contains many attributes used in credit rating. These attributes include information on existing current account status, credit duration, credit history, and credit purpose and amount, among others. In total, there are 20 attributes describing several characteristics of individuals and contains 1000 instances; it has been widely used in research related to credit rating (Esenogho et al., 2022 ; Fanai and Abbasimehr, 2023 ; Lee et al., 2018 ; Pumsirirat and Yan, 2018 ; Seera et al., 2021 ).

Stalog (Australian credit approval)

The dataset belongs to the UC Irvine ML repository and was created by Ross Quinlan in 1997. It focuses on credit card applications within the financial field (Quinlan, 1997 ). It has a total of 690 instances and 14 attributes of which 6 are numeric of type integer/actual and 8 are categorical; consequently, its data characteristics are multivariate—that is, it contains multiple variables and/or attributes. Several studies have used the ensemble data (Lee et al., 2018 ; Pumsirirat and Yan, 2018 ; Seera et al., 2021 ; Singh et al., 2022 ).

China Stock Market and Accounting Research

The China Stock Market and Accounting Research (CSMAR) Database contains financial reports and violations of CSMAR. It provides information on China’s stock markets and the financial statements of listed companies; the data were collected between 1998 and 2016 from publicly funded companies (CSMAR, 2022 ). It includes fraudulent and non-fraudulent companies committing several types of fraud, such as showing higher profits and/or earnings, fictitious assets, false records, and other irregularities in financial reporting.

The set comprises 35,574 samples, including 337 annual fraud samples of companies in the Chinese stock market. This is selected as a data source to illustrate the financial statement information of listed companies in three studies (Achakzai and Juan, 2022 ; Y. Chen and Wu, 2022 ; Shou et al., 2023 ).

Synthetic financial datasets for fraud detection

It was generated by the PaySim mobile money simulator using aggregated data from a private dataset deriving from one month of financial records from a mobile money service in an African country (López-Rojas, 2017 ). The original records were provided by a multinational company offering mobile financial services in more than 14 countries worldwide. The dataset has been used in numerous studies (Alwadain et al., 2023 ; Hwang and Kim, 2020 ; Moreira et al., 2022 ).

The synthetic dataset provided is a scaled-down version, representing a quarter of the original dataset. It was made available for Kaggle. It constitutes 6,362,620 samples, with 8213 fraudulent transaction samples and 6,354,407 non-fraudulent transactions. It includes several attributes related to mobile money transactions: transaction type (cash-in, cash-out, debit, payment, and transfer); transaction amount in local currency; customer information (customer conducting the transaction and transaction recipient); initial balances before and after the transaction; and fraudulent behavior indicators (isFraud and isFlaggedFraud). These attributes indicate a binary classification.

Default of credit card clients

It was created by I-Cheng Yeh and introduced on January 25, 2016, and is available in the UC Irvine ML repository (Yeh, 2016 ). The dataset, which is used for classification tasks, focuses on the case of defaulted payments of credit card customers in Taiwan in the business area. Moreover, it is a multivariate dataset with 30,000 instances and 24 attributes. They include attributes such as the amount of credit granted, payment history, and statement records spanning April through September 2005. This data source is selected in studies such as those by Esenogho et al. ( 2022 ), Pumsirirat and Yan ( 2018 ), and Seera et al. ( 2021 ).

Synthetic data from a financial payment system

Edgar Lopez Rojas created the dataset in 2017. The synthetic data were generated in the BankSim payment simulator. It is based on a sample of transactional data provided by a bank in Spain (López-Rojas, 2017 ). It includes the following characteristics: step, customer ID, age, gender, zip code, merchant ID, zip code of merchant, category of purchase, amount of purchase, and fraud status. It comprises 594,643 transactions, of which ~1.2% (7200) were labeled as fraud and the rest (587,443) were labeled as genuine, and it was processed as a binary classification problem. The dataset has been used in several investigations (Esenogho et al., 2022 ; Pumsirirat and Yan, 2018 ; Seera et al., 2021 ).

This dataset is a financial and economic information and research database (Compustat, 2022 ). It contains characteristics related to various aspects of companies, such as asset quality, revenues earned, administrative and sales expenses, and sales growth, among others. COMPUSTAT collects and stores detailed information on listed companies in the United States and Canada. The set includes information on 61 characteristics and consists of 228 companies, of which half showed fraud in their information while the other half did not present fraud (binary classification), and it is used in studies (Dutta et al., 2017 ; Whiting et al., 2012 ).

Insurance Company Benchmark (COIL 2000)

This dataset is used in the CoIL 2000 challenge, available at the UC Irvine Machine Learning Repository, created by Peter Van Der Putten. It consists of 9822 instances and 86 attributes containing information about customers of an insurance company and includes data on product use and sociodemographic data (Putten, 2000 ). It is characterized as multivariate and is used to perform regression/classification tasks by studies using the dataset (Huang et al., 2018 ; Sathya and Balakumar, 2022 ).

Bitcoin network transactional metadata

This dataset contains Bitcoin transaction metadata from 2011 to 2013. It was created by Omer Shafiq (Kaggle handle: OmerShafiq) and introduced to the Kaggle online community in 2019. The set comprises 11 attributes and 30,000 instances related to Bitcoin transactions, bitcoin flows, connections between transactions, average ratings, and malicious transactions (Omershafiq, 2019 ). It is efficient for investigating and analyzing anomalies and fraud detection in Bitcoin transactions (Ashfaq et al., 2022 ).

SQ2: What were the metrics used to assess the performance of ML models to detect financial fraud?

Based on previous studies (Nicholls et al., 2021 ; Shahana et al., 2023 ), the performance of the metrics used in ML models is the last step in determining whether the results align with the problem at hand. The metrics demonstrate the ability to do a specific task, such as classification, regression, or clustering quality, as they allow comparing the performance of models.

Many evaluation metrics have been used in previous studies, such as precision, sensitivity, recall, accuracy, and area under the curve. These metrics can be calculated using the confusion matrix. Figure 10 compares the target and true values with the predicted ones based on the study by Torrano et al. ( 2018 ).

figure 10

Presents the confusion matrix generated during the evaluation of the financial fraud detection models. Authors’ own elaboration.

According to previous studies (Shahana et al., 2023 ; Zhao and Bai, 2022 ), true positive (TP) projects a positive value (fraud) that matches the true value; true negative (TN) accurately predicts a negative outcome (no fraud); false positive (FP) denotes the predicted positive whose true value is negative (no fraud); and false negative (FN) represents the predicted negative whose true value is positive (fraud). FP and FN represent the misclassification cost, also known as classification model prediction error.

The metrics used to evaluate the effectiveness of supervised ML techniques are as follows. The accuracy metric is the most commonly used (Ramírez-Alpízar et al., 2020 ). It is defined as the total number or proportion of correct predictions/samples over the total number of records analyzed. Further, it is a method of evaluating the performance of a binary classification model distinguishing between true and false. In Eq. ( 1 ), it calculates the accuracy metric.

The sensitivity metric known as recall (TP or TPR rate) is the ratio of successfully identified fraudulent predictions to the total number of fraudulent samples. Equation ( 2 ) calculates the sensitivity metric.

The specificity metric (TN rate or TNR) is the percentage of non-fraudulent samples properly designated as non-fraudulent. It is represented in Eq. ( 3 ).

Accuracy is the ratio of correctly classified fraudulent predictions to the total number of fraudulent predictions. Equation ( 4 ) calculates the precision metric.

F1-score is a metric that combines accuracy and recall using a weighted harmonic mean (Bakumenko and Elragal, 2022 ). It is presented in Eq. ( 5 ).

Type I error (FP or FPR rate) is the number of legitimate predictions mistakenly labeled as fraudulent as a percentage of all legitimate predictions. The metric is defined in Eq. ( 6 ).

Type II error (FN or FNR rate) is the proportion of fraudulent samples incorrectly designated as non-fraudulent. Type I and II errors make up the overall error rate. It is defined in Eq. ( 7 ).

The area under the curve (AUC), or area under the receiver operating characteristic curve, represents a graphic of TPR versus FPR (Y. Chen and Wu, 2022 ). AUC values range from 0 to 1; the more accurate an ML model, the higher its AUC value. It is a metric that represents the model’s performance when differentiating between two classes.

Following the guidelines in previous studies (Amrutha et al., 2023 ; García-Ordás et al., 2023 ; Palacio, 2019 ), some metrics used to evaluate the effectiveness of unsupervised ML techniques will be defined.

The silhouette coefficient identifies the most appropriate number of clusters; a higher coefficient means better quality with this number of clusters. Equation ( 8 ) calculates the metric.

where x denotes the average of the distances of observation j with respect to the rest of the observations of the cluster to which j belongs. Furthermore, y denotes the minimum distance to a different cluster. The silhouette score takes values between −1 and 1. Based on the study by Viera et al. ( 2023 ), 1 (correct) represents the assignment of observation j to a good cluster, zero (0) indicates that observation j is between two distinct groups, and −1 (incorrect) indicates that the assignment of j to the cluster is a bad clustering.

The rand index is the similarity measure between two clusters considering all pairs and including those assigned to the same cluster in both the predictions and the true cluster. Equation ( 9 ) calculates the index.

The Davies–Bouldin metric is a score used to evaluate clustering algorithms. It is defined as the mean value of the samples, represented in Eq. ( 10 ).

where k denotes the number of groups \({c}_{i},{c}_{j}\) , k represents the centroids of cluster i and j , respectively, with \(d\left({c}_{i},{c}_{i}\right)\) as the distance between them, while \({\alpha }_{i}\) and \({\alpha }_{j}\) corresponds to the average distance of all elements in clusters i and j and the distance to their respective \({c}_{i}\) and \({c}_{j}\) centroids (Viera et al., 2023 ).

The Fowlkes–Mallows index is defined as the geometric mean between precision and recall, represented in Eq. ( 11 ).

The cophenetic correlation coefficient is a clustering method to produce a dendrogram (tree diagram). Equation ( 12 ) indicates the metric.

where \(x(i,j)=|{x}_{i}-{x}_{j}|\) represents the Euclidean distance between the i th and j th points of \(x\) . While \(t(i,j)\) is the height of the node at which the two points, \({t}_{i}\) and \({t}_{j}\) , of the dendrogram meet and \(\bar{x}\) and \(\bar{t}\) are the mean value of \(x(i,j)\) and \(t(i,j).\)

Discussion and conclusion

Research on the detection of financial fraud by applying ML techniques is a significant topic. On the one hand, fraud directly affects the business world and, on the other hand, detecting it early involves great challenges; this has led to designing tools using AI, such as ML techniques. This study is an SLR using adaptations of the PRISMA and Kitchenham methods to critically analyze and synthesize the study results. Research articles published in Scopus, IEEE Xplore, Taylor & Francis, SAGE, and ScienceDirect were explored. The results were presented in two parts. The first one included a bibliometric study with the open-source software VOSviewer, followed by a discussion of the SLR results.

The bibliometric analysis presented the results of the authors, articles, sources, countries, and most important trends in the literature on financial fraud detection by applying ML, as well as an analysis of fraud types, ML models, and datasets. From the 104 articles dating from 2012 to 2023, several types of fraudulent activities are described, as well as external (e.g., credit cards, insurance) and internal (e.g., financial statements, money laundering) frauds, and a brief report on fraud, in general, is provided. Further, it was possible to extract supervised and unsupervised ML techniques, with the 10 most used models as RF in supervised techniques and autoencoder as an unsupervised technique.

During the literature review on the detection of financial fraud using machine learning models, it became evident that several authors have made significant contributions. However, some stand out more in terms of the number of publications and citations. Some of the most notable ones, Ahmed M. with 318 citations, Ileberi E. with 82, and Chen S. with 84, have made important advances in the field. Others, such as Abdallah A., with only one publication, but with 333 citations, have also made a considerable impact. And although researchers such as Khan S. and Mishra B. have fewer citations, the combined work of all these authors has established a robust knowledge base, providing a deeper understanding of the challenges and opportunities present in financial fraud detection through machine learning techniques.

Consistent with the analysis of the article clusters, clusters 2, 4 and 11 emerge as the most influential in this field with topics of interdisciplinary interest (artificial intelligence/machine learning, accounting, finance), among academics and auditing firms. The SLR evidences that authors in these domains often cooperate when it comes to publication, in turn, studies by (Huang et al., 2018 ; J. Kim et al., 2019 ; Sahin et al., 2013 ; Dutta et al., 2017 ) are highly cited articles.

Similarly, the leading countries in the research area include China, which has the largest number of published articles, followed by India and Saudi Arabia. The production of articles on the subject was found to be geographically distributed among countries whose economies are developing and are in transition, which indicates a greater capacity for the production of papers and research. In comparison to Ashtiani and Raahemi’s ( 2022 ) study highlighting the United States, leading with the largest number of papers (18) in the area, followed by China (8) and Greece (7), Al-Hashedi and Magalingam’s ( 2021 ) posit that India is the top producer of articles with 24, followed by China (14) and the United States (9).

The journals that have accepted the publication of these studies are specifically in the accounting and computer science domain. There is much literature on computers and security, expert systems with applications, and knowledge-based systems on financial fraud detection through ML models, as supported by Al-Hashedi and Magalingam ( 2021 ) and Ali et al. ( 2022 ). The keywords highlighted in the studies include crime, fraud detection, and ML. These words indicate a central focus on the financial industry, where learning and/or data mining systems help discover patterns or anomalies in financial data, in addition to attractive trends and approaches in the research field.

The literature has indicated articles investigating fraud types, particularly credit card loan fraud and insurance fraud, which are of great interest to the scientific community (Al-Hashedi and Magalingam, 2021 ; Ali et al., 2022 ; West and Bhattacharya, 2016 ). This study has classified the different types of fraud into internal and external, and sub-classifications have been derived. In both types, ML techniques have been used to detect financial fraud—supervised (59 articles), unsupervised (19 articles), supervised and unsupervised (16 articles), and deep learning (3 articles), among others. Most of the studies analyzed have developed binary classification models, that is, fraud or non-fraud. Supervised learning techniques require labeled data, and the most frequently used models are LR, RF, and SVM, among others. In the experiments, the prevalence of metrics such as accuracy, precision, sensitivity, and F1-score are highlighted. For unsupervised learning as a technique, the data do not have a label and focus on discovering new patterns with algorithms such as DBSCAN, autoencoder, and IF, among others. The evaluation with internal metrics was not made in detail. Few studies using semi-supervised learning and deep learning techniques have been highlighted because of the fact that they are novel.

Further, it is found in the trend through the keywords, as the research works address the subject of ML, learning algorithms, deep learning, SVM, fraudulent transactions, and anomaly detection, but it is evident that there is little research on unsupervised learning and deep learning. The scarce use of these techniques may be because of the complexity of the models and the high consumption of computational resources. In the analysis of the 86 experiment articles, few articles were found that used unsupervised techniques. Also, a large part of the datasets used is labeled, which requires further experimentation with models and unlabeled real-world datasets (Ounacer et al., 2018 ; Pumsirirat and Yan, 2018 ; Rubio et al., 2020 ; Van Capelleveen et al., 2016 ; Vanini et al., 2023 ). Meanwhile, labeled data are costly because an expert is required for their construction. Thus, more attention has been given to data origin, preprocessing, and feature extraction before training an ML model to increase detection accuracy. Accordingly, it should be emphasized that deep learning models require a thorough design and adjustment compared with previous models. They are quite sensitive to the architecture structure and choice of hyperparameters. Further, the data quality and quantity required is relatively high, so it should be considered in the design stage.

The studies show that the datasets for the experiments were taken from the stock exchanges of China, Canada, the United States, Taiwan, and Tehran, among others. The researchers used ML models to detect financial fraud in credit card loans, highlighting the use of the “Credit Card Fraud Detection” dataset, mentioned 15 times. Also, the performance of ML models can be affected because of the selected set by the number of selected attributes and instances. From the analysis, it was observed that most of the articles use real datasets obtained from existing databases, historical records, or other collection methods, and few studies use synthetic datasets (four articles), which are those generated by modeling or simulation techniques and try to mimic a real dataset.

Still, the integration of real and synthetic datasets enables a comprehensive approach to the problem by providing a basis and complementary information for conclusions and comparisons with other studies on the performance of ML models. Specifically, the datasets used in recent studies and/or articles, spanning from 2012 to 2023, reveal concern related to obsolete data approximately from 1994, which, because of their age, do not provide effective and accurate results in the current context as a result of the new fraud modalities created day after day, with characteristics and behavior patterns that have evolved significantly over time.

The literature review and bibliometric analyses on financial fraud detection using machine learning and its various techniques conducted between 2012 and 2023 show a remarkable evolution in this field. Authors, including Ahmed M., Ileberi E., and Chen S. have made important contributions with a high number of citations. There has been fundamental interdisciplinary collaboration between areas such as artificial intelligence, accounting, finance, and information security, highlighting widely cited studies such as Huang et al. ( 2018 ), J. Kim et al. ( 2019 ), Sahin et al. ( 2013 ), and Dutta et al. ( 2017 ). Countries such as China, India and Saudi Arabia leading in publications can be seen, which reflects the global effort of emerging economies. Supervised learning techniques such as Random Forest, and unsupervised ones, like Autoencoder, are the most widely used. Furthermore, the effort and enthusiasm for the use of deep learning, despite its complexity and high computational resource requirements, are evident.

Research mainly uses real datasets such as those from the Chinese, Canadian, US, Taiwanese, and Tehran stock exchanges, with the “Credit Card Fraud Detection” dataset being the most important one. The journals that publish these studies belong both to the accounting area and to computer science, with extensive literature in Computers and Security, Expert Systems with Applications, and Knowledge-Based Systems. While it is true that the accuracy of fraud detection depends on the quality of the data and preprocessing with various algorithms, the need for robust and updated approaches to face new fraud modalities is particularly highlighted.

Limitations and scope for future research

The study had limitations that affected the scope and interpretation of the results. Although a systematic review was performed, the lack of quantitative support in the data collected is acknowledged. From the 104 articles identified in the SLR, 18 correspond to systematic reviews, which limits the availability of studies with specific details or experiments. This affected the depth of the analysis and the comprehensiveness of the results obtained.

The literature review reveals a predominant emphasis on the banking sector, especially in relation to credit card fraud and insurance fraud. The narrow focus leads to a lack of diversity in the types of fraud studied, excluding internal fraud types such as embezzlement, racketeering, smurfing, defalcation, collusion, signature forgery, and manipulation of accounting documents, among others. The underrepresentation of these other fraud types compromises the generalization of the findings and the applicability of ML models to contexts beyond the banking sector.

The datasets analyzed show a significant deficiency in the representation of fraud types. It can be observed that most of these datasets originated from the main stock exchanges and, additionally, the information used to carry out the experiments is old. This scenario indicates the inclusion of non-contemporary fraud types in the analysis. The limited availability of information on the performance metrics of the unsupervised learning models made it difficult to count the evaluation metrics used to predict financial fraud.

The field of financial fraud detection using ML models offers promising prospects for future research. An area of potential improvement is experimentation with advanced techniques, such as reinforcement learning or deep neural network architectures, to improve the accuracy and efficiency of models, including unsupervised learning. This approach could enable the development of more sophisticated systems capable of identifying complex fraud patterns and dynamically adjusting to the changing strategies of criminals, who are constantly innovating new fraud methods.

Moreover, it is suggested that the applicability of fraud detection systems in contexts other than banking be analyzed by adopting the anomaly approach, which would make it possible to move forward in the detection of fraud in real-time and minimize risks in organizations. It is also proposed that a dataset be created, containing real context information, which is freely accessible and includes new fraud methods to provide the scientific community with an updated dataset.

Data availability

The datasets generated and/or analyzed in this study are available in the Harvard Dataverse repository https://doi.org/10.7910/DVN/CM8NVY .

Abdallah A, Maarof MA, Zainal A (2016) Fraud detection system: a survey. J Netw Comput Appl 68:90–113. https://doi.org/10.1016/j.jnca.2016.04.007

Article   Google Scholar  

Achakzai MAK, Juan P (2022) Using machine learning meta-classifiers to detect financial frauds. Financ Res Lett 48:102915. https://doi.org/10.1016/j.frl.2022.102915

Ahmed M, Mahmood AN, Islam MdR (2016) A survey of anomaly detection techniques in financial domain. Future Gener Comput Syst 55:278–288. https://doi.org/10.1016/j.future.2015.01.001

Al Ali A, Khedr AM, El-Bannany M, Kanakkayil S (2023) A powerful predicting model for financial statement fraud based on optimized XGBoost ensemble learning technique. Appl Sci 13(4):2272. https://doi.org/10.3390/app13042272

Article   CAS   Google Scholar  

Alarfaj FK, Malik I, Khan HU, Almusallam N, Ramzan M, Ahmed M (2022) Credit card fraud detection using state-of-the-art machine learning and deep learning algorithms. IEEE Access 10:39700–39715. https://doi.org/10.1109/ACCESS.2022.3166891

Al-Hashedi KG, Magalingam P (2021) Financial fraud detection applying data mining techniques: a comprehensive review from 2009 to 2019. Comput Sci Rev 40:100402. https://doi.org/10.1016/j.cosrev.2021.100402

Ali A, Abd Razak S, Othman SH, Eisa TAE, Al-Dhaqm A, Nasser Tusneem ME, Elshafie H, Saif A (2022) Financial fraud detection based on machine learning: a systematic literature review. Appl Sci (Switz). https://doi.org/10.3390/app12199637

Alsuwailem AAS, Salem E, Saudagar AKJ (2022) Performance of different machine learning algorithms in detecting financial fraud. Comput Econ. https://doi.org/10.1007/s10614-022-10314-x

Alwadain A, Ali RF, Muneer A (2023) Estimating financial fraud through transaction-level features and machine learning. Mathematics 11(5):1184. https://doi.org/10.3390/math11051184

Amrutha E, Arivazhagan S, Jebarani WSL (2023) Deep clustering network for steganographer detection using latent features extracted from a novel convolutional autoencoder. Neural Process Lett 55(3):2953–2964. https://doi.org/10.1007/s11063-022-10992-6

Arévalo F, Barucca P, Téllez-León I-E, Rodríguez W, Gage G, Morales R (2022) Identifying clusters of anomalous payments in the salvadorian payment system. Lat Am J Cent Bank. 3(1):100050. https://doi.org/10.1016/j.latcb.2022.100050

Ashfaq T, Khalid R, Yahaya A, Aslam S, Alsafari S, Hameed I (2022) A machine learning and blockchain bases efficient fraud detection mechanism. Sensors 22(19):7162. https://doi.org/10.3390/s22197162

Article   ADS   PubMed   PubMed Central   Google Scholar  

Ashtiani MN, Raahemi B (2022) Intelligent fraud detection in financial statements using machine learning and data mining: a systematic literature review. IEEE Access 10:72504–72525. https://doi.org/10.1109/ACCESS.2021.3096799

Aslam F, Hunjra A, Ftiti Z, Louhichi W, Shams T (2022) Insurance fraud detection: evidence from artificial intelligence and machine learning. Res Int Bus Financ. https://doi.org/10.1016/j.ribaf.2022.101744

Baghdasaryan V, Davtyan H, Sarikyan A, Navasardyan Z (2022) Improving tax audit efficiency using machine learning: the role of taxpayer’s network data in fraud detection. Appl Artif Intell 36(1). https://doi.org/10.1080/08839514.2021.2012002

Baker MR, Mahmood ZN, Shaker EH (2022) Ensemble learning with supervised machine learning models to predict credit card fraud transactions. Rev Intell Artif. https://doi.org/10.18280/ria.360401

Bakumenko A, Elragal A (2022) Detecting anomalies in financial data using machine learning algorithms. Systems. https://doi.org/10.3390/systems10050130

Bekirev AS, Klimov VV, Kuzin MV, Shchukin BA (2015) Payment card fraud detection using neural network committee and clustering. Optical Mem. Neural Netw 24(3):193–200. https://doi.org/10.3103/S1060992X15030030

Benchaji I, Douzi S, Ouahidi BEl (2021) Credit card fraud detection model based on LSTM recurrent neural networks. J Adv Inf Technol 12(2):113–118. https://doi.org/10.12720/jait.12.2.113-118

Błaszczyński J, de Almeida Filho AT, Matuszyk A, Szeląg M, Słowiński R (2021) Auto loan fraud detection using dominance-based rough set approach versus machine learning methods. Expert Syst Appl 163:113740. https://doi.org/10.1016/j.eswa.2020.113740

Bolgorian M, Mayeli A, Ronizi NG (2023) CEO compensation and money laundering risk. J Econ Criminol 1:100007. https://doi.org/10.1016/j.jeconc.2023.100007

Chen S (2016) Detection of fraudulent financial statements using the hybrid data mining approach. SpringerPlus 5(1):89. https://doi.org/10.1186/s40064-016-1707-6

Article   PubMed   PubMed Central   Google Scholar  

Chen S, Goo Y-JJ, Shen Z-D (2014) A hybrid approach of stepwise regression, logistic regression, support vector machine, and decision tree for forecasting fraudulent financial statements. Sci World J 2014:1–9. https://doi.org/10.1155/2014/968712

Chen Y, Wu Z (2022) Financial fraud detection of listed companies in China: a machine learning approach. Sustainability 15(1):105. https://doi.org/10.3390/su15010105

Chullamonthon P, Tangamchit P (2023) Ensemble of supervised and unsupervised deep neural networks for stock price manipulation detection. Expert Syst Appl 220:119698. https://doi.org/10.1016/j.eswa.2023.119698

Compustat (2022) Compustat. S&P Global Market Intelligence. https://www.marketplace.spglobal.com/en/datasets?cq_cmp=9778467255&cq_plac=&cq_net=g&cq_pos=&cq_plt=gp&utm_source=google&utm_medium=cpc&utm_campaign=DMS_Marketplace_Search_Google&utm_term=&utm_content=586436401424&_bt=586436401424&_bk=&_bm=&_bn=g&_bg=133704002389&gclid=Cj0KCQjw4s-kBhDqARIsAN-ipH3TguUoVohfDZgD65fjvKomc6BBgJ3uA9zP95m6u4vOs5yG7_L7w2UaAnnvEALw_wcB

CSMAR (2022) China Stock Market & Accounting Research (CSMAR). Wharton University of Pennsylvania. https://wrds-www.wharton.upenn.edu/pages/about/data-vendors/china-stock-market-accounting-research-csmar/

Dalal S, Seth B, Radulescu M, Secara C, Tolea C (2022) Predicting fraud in financial payment services through optimized hyper-parameter-tuned XGBoost model. Mathematics 10(24):4679. https://doi.org/10.3390/math10244679

Dantas RM, Firdaus R, Jaleel F, Neves Mata P, Mata MN, Li G (2022) Systemic acquired critique of credit card deception exposure through machine learning. J Open Innov: Technol Mark Complex 8(4):192. https://doi.org/10.3390/joitmc8040192

Domashova J, Kripak E (2021) Identification of non-typical international transactions on bank cards of individuals using machine learning methods. Procedia Comput Sci 190:178–183. https://doi.org/10.1016/j.procs.2021.06.023

Domashova J, Kripak E (2022) Development of a generalized algorithm for identifying atypical bank transactions using machine learning methods. Procedia Comput Sci 213:101–109. https://doi.org/10.1016/j.procs.2022.11.044

Dutta I, Dutta S, Raahemi B (2017) Detecting financial restatements using data mining techniques. Expert Syst Appl 90:374–393. https://doi.org/10.1016/j.eswa.2017.08.030

Elshaar S, Sadaoui S (2020) Semi-supervised Classification of Fraud Data in Commercial Auctions. Appl Artif Intell 34(1):47–63. https://doi.org/10.1080/08839514.2019.1691341

Esenogho E, Mienye ID, Swart TG, Aruleba K, Obaido G (2022) A neural network ensemble with feature engineering for improved credit card fraud detection. IEEE Access 10:16400–16407. https://doi.org/10.1109/ACCESS.2022.3148298

Eshghi A, Kargari M (2019) Introducing a new method for the fusion of fraud evidence in banking transactions with regards to uncertainty. Expert Syst Appl 121:382–392. https://doi.org/10.1016/j.eswa.2018.11.039

Estupiñán Gaitán R (2015) Control interno y fraudes: análisis de informe COSO I, II y III con base en los ciclos transaccionales, Tercera edición (Niebel BW (ed)). Ecoe Ediciones

Fanai H, Abbasimehr H (2023) A novel combined approach based on deep autoencoder and deep classifiers for credit card fraud detection. Expert Syst Appl 217:119562. https://doi.org/10.1016/j.eswa.2023.119562

Fang Y, Zhang Y, Huang C (2019) Credit card fraud detection based on machine learning. Comput Mater Contin 61(1):185–195. https://doi.org/10.32604/cmc.2019.06144

Femila Roseline J, Naidu G, Samuthira Pandi V, Alamelu alias Rajasree S, Mageswari N (2022) Autonomous credit card fraud detection using machine learning approach✰. Comput Electr Eng 102:108132. https://doi.org/10.1016/j.compeleceng.2022.108132

García-Ordás MT, Alaiz-Moretón H, Casteleiro-Roca J-L, Jove E, Benítez-Andrades JA, García-Rodríguez I, Quintián H, Calvo-Rolle JL (2023) Clustering techniques selection for a hybrid regression model: a case study based on a solar thermal system. Cybern Syst 54(3):286–305. https://doi.org/10.1080/01969722.2022.2030006

Gupta S, Mehta SK (2021) Data mining-based financial statement fraud detection: systematic literature review and meta-analysis to estimate data sample mapping of fraudulent companies against non-fraudulent companies. Global Bus Rev https://doi.org/10.1177/0972150920984857

Hajek P, Henriques R (2017) Mining corporate annual reports for intelligent detection of financial statement fraud—a comparative study of machine learning methods. Knowl-Based Syst 128:139–152. https://doi.org/10.1016/j.knosys.2017.05.001

Hamza C, Lylia A, Nadine C, Nicolas C (2023) Semi-supervised method to detect fraudulent transactions and identify fraud types while minimizing mounting costs. Int J Adv Comput Sci Appl 14(2). https://doi.org/10.14569/IJACSA.2023.0140298

Hilal W, Gadsden SA, Yawney J (2022) Financial fraud: a review of anomaly detection techniques and recent advances. Expert Syst Appl 193:116429. https://doi.org/10.1016/j.eswa.2021.116429

Hofmann H (1994) Statlog (German credit data). UCI Machine Learning Repository. https://doi.org/10.24432/C5NC77

Huang D, Mu D, Yang L, Cai X (2018) CoDetect: financial fraud detection with anomaly feature detection. IEEE Access 6:19161–19174. https://doi.org/10.1109/ACCESS.2018.2816564

Hwang J, Kim K (2020) An efficient domain-adaptation method using GAN for fraud detection. Int J Adv Comput Sci Appl 11(11). https://doi.org/10.14569/IJACSA.2020.0111113

Ileberi E, Sun Y, Wang Z (2021) Performance evaluation of machine learning methods for credit card fraud detection using SMOTE and AdaBoost. IEEE Access 9:165286–165294. https://doi.org/10.1109/ACCESS.2021.3134330

Ileberi E, Sun Y, Wang Z (2022) A machine learning based credit card fraud detection using the GA algorithm for feature selection. J Big Data 9(1):24. https://doi.org/10.1186/s40537-022-00573-8

Khan S, Alourani A, Mishra B, Ali A, Kamal M (2022) Developing a credit card fraud detection model using machine learning approaches. Int J Adv Comput Sci Appl 13(3). https://doi.org/10.14569/IJACSA.2022.0130350

Kim J, Kim H-J, Kim H (2019) Fraud detection for job placement using hierarchical clusters-based deep neural networks. Appl Intell 49(8):2842–2861. https://doi.org/10.1007/s10489-019-01419-2

Kim YJ, Baik B, Cho S (2016) Detecting financial misstatements with fraud intention using multi-class cost-sensitive learning. Expert Syst Appl 62:32–43. https://doi.org/10.1016/j.eswa.2016.06.016

Kitchenham B, Brereton P (2013) A systematic review of systematic review process research in software engineering. Inf Softw Technol 55(12):2049–2075. https://doi.org/10.1016/j.infsof.2013.07.010

Kitchenham B, Stuart C (2007) Guidelines for performing systematic literature reviews in software engineering. https://www.researchgate.net/publication/302924724_Guidelines_for_performing_Systematic_Literature_Reviews_in_Software_Engineering

Kootanaee AJ, Aghajan AAP, Shirvani MH (2021) A hybrid model based on machine learning and genetic algorithm for detecting fraud in financial statements. J Optim Ind Eng 14(2):183–201. https://doi.org/10.22094/JOIE.2020.1877455.1685

KPMG (2022) Una triple amenaza en las Américas. KMPG. https://kpmg.com/co/es/home/insights/2022/01/kpmg-fraud-outlook-survey.html

Kumar S, Ahmed R, Bharany S, Shuaib M, Ahmad T, Tag Eldin E, Rehman AU, Shafiq M (2022) Exploitation of machine learning algorithms for detecting financial crimes based on customers’ behavior. Sustainability 14(21):13875. https://doi.org/10.3390/su142113875

Kumbure MM, Lohrmann C, Luukka P, Porras J (2022) Machine learning techniques and data for stock market forecasting: a literature review. Expert Syst Appl 197:116659. https://doi.org/10.1016/j.eswa.2022.116659

Lee H, Choi E, Kim I, Choi D, Go W, Lee K, Yim H, Lee T (2018) Feature selection practice for unsupervised learning of credit card fraud detection. J Theor Appl Inf Technol 96(2):408–417

Google Scholar  

Lei X, Mohamad UH, Sarlan A, Shutaywi M, Daradkeh YI, Mohammed HO (2022) Development of an intelligent information system for financial analysis depend on supervised machine learning algorithms. Inf Process Manag 59(5):103036. https://doi.org/10.1016/j.ipm.2022.103036

Lokanan M, Tran V, Vuong NH (2019) Detecting anomalies in financial statements using machine learning algorithm. Asian J Account Res 4(2):181–201. https://doi.org/10.1108/AJAR-09-2018-0032

Lokanan ME, Sharma K (2022) Fraud prediction using machine learning: The case of investment advisors in Canada. Mach Learn Appl 8:100269. https://doi.org/10.1016/j.mlwa.2022.100269

Lokanan ME (2022) Predicting money laundering using machine learning and artificial neural networks algorithms in banks. J Appl Secur Res 1–25. https://doi.org/10.1080/19361610.2022.2114744

López-Rojas E (2017) Synthetic financial datasets for fraud detection. Kaggle. https://www.kaggle.com/datasets/ealaxi/paysim1

Machine Learning Group (2018) Credit card fraud detection. Kaggle. https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud

Madhurya MJ, Gururaj HL, Soundarya BC, Vidyashree KP, Rajendra AB (2022) Exploratory analysis of credit card fraud detection using machine learning techniques. Glob Transit Proc 3(1):31–37. https://doi.org/10.1016/j.gltp.2022.04.006

Malik EF, Khaw KW, Belaton B, Wong WP, Chew X (2022) Credit card fraud detection using a new hybrid machine learning architecture. Mathematics 10(9):1480. https://doi.org/10.3390/math10091480

Márquez Arcila RH (2019) Auditoría forense. Ecoe Ediciones

Misra S, Thakur S, Ghosh M, Saha SK (2020) An autoencoder based model for detecting fraudulent credit card transaction. Procedia Comput Sci 167:254–262. https://doi.org/10.1016/j.procs.2020.03.219

Moher D, Liberati A, Tetzlaff J, Altman DG (2009) Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med 6(7):e1000097. https://doi.org/10.1371/journal.pmed.1000097

Mongwe W, Malan K (2020) A survey of automated financial statement fraud detection with relevance to the South African context. S Afr Comput J 32(1). https://doi.org/10.18489/sacj.v32i1.777

Montes Salazar CA (2019) Riesgos de fraude en una auditoría de estados financieros (1.a ed.). Alfaomega. ISBN: 9789587782639. https://www.alfaomegacloud.com/reader/riesgos-de-fraude-en-una-auditoria-de-estados-financieros?location=3

Moreira MÂL, Junior C, de SR, Silva DF, de L, de Castro Junior MAP, Costa IP, de A, Gomes CFS, dos Santos M (2022) Exploratory analysis and implementation of machine learning techniques for predictive assessment of fraud in banking systems. Procedia Comput Sci 214:117–124. https://doi.org/10.1016/j.procs.2022.11.156

Narsimha B, Raghavendran CV, Rajyalakshmi P, Reddy GK, Bhargavi M, Naresh P (2022) Cyber defense in the age of artificial intelligence and machine learning for financial fraud detection application. Int J Electr Electron Res 10(2):87–92. https://doi.org/10.37391/ijeer.100206

Nian K, Zhang H, Tayal A, Coleman T, Li Y (2016) Auto insurance fraud detection using unsupervised spectral ranking for anomaly. J Financ Data Sci 2(1):58–75. https://doi.org/10.1016/j.jfds.2016.03.001

Nicholls J, Kuppa A, Le-Khac N-A (2021) Financial cybercrime: a comprehensive survey of deep learning approaches to tackle the evolving financial crime landscape. IEEE Access 9:163965–163986. https://doi.org/10.1109/ACCESS.2021.3134076

Nonnenmacher J, Marx Gómez J (2021) Unsupervised anomaly detection for internal auditing: Literature review and research agenda. Int J Digit Account Res 1–22. https://doi.org/10.4192/1577-8517-v21_1

Olszewski D (2014) Fraud detection using self-organizing map visualizing the user profiles. Knowl Based Syst 70:324–334. https://doi.org/10.1016/j.knosys.2014.07.008

Omershafiq (2019) Bitcoin network transactional metadata. Kaggle. https://www.kaggle.com/datasets/omershafiq/bitcoin-network-transactional-metadata

Ounacer S, Ait El Bour H, Oubrahim Y, Ghoumari MY, Azzouazi M (2018) Using isolation forest in anomaly detection: the case of credit card transactions. Period Eng Nat Sci 6(2):394. https://doi.org/10.21533/pen.v6i2.533

Palacio SM (2019) Abnormal pattern prediction: detecting fraudulent insurance property claims with semi-supervised machine-learning. Data Sci J 18(1):35. https://doi.org/10.5334/dsj-2019-035

Papík M, Papíková L (2022) Detecting accounting fraud in companies reporting under US GAAP through data mining. Int J Account Inf Syst 45:100559. https://doi.org/10.1016/j.accinf.2022.100559

Plakandaras V, Gogas P, Papadimitriou T, Tsamardinos I (2022) Credit card fraud detection with automated machine learning systems. Appl Artif Intell 36(1). https://doi.org/10.1080/08839514.2022.2086354

Polak P, Nelischer C, Guo H, Robertson DC (2020) Intelligent” finance and treasury management: what we can expect. AI Soc 35(3):715–726. https://doi.org/10.1007/s00146-019-00919-6

PricewaterhouseCoopers (2022) Encuesta Global de Crimen y Fraude Económico de PwC Colombia 2022 – 2023. https://www.pwc.com/co/es/publicaciones/encuesta-crimen-fraude-economico.html

Pumsirirat A, Yan L (2018) Credit card fraud detection using deep learning based on auto-encoder and restricted Boltzmann machine. Int J Adv Comput Sci Appl 9(1). https://doi.org/10.14569/IJACSA.2018.090103

Putten P (2000) Insurance Company Benchmark (COIL 2000). UCI Machine Learning Repository. https://doi.org/10.24432/C5630S

Quinlan R (1997) Statlog (Australian credit approval). UCI Machine Learning Repository. https://doi.org/10.24432/C59012

Rakowski R, Polak P, Kowalikova P (2021) Ethical aspects of the impact of AI: the status of humans in the era of artificial intelligence. Society 58(3):196–203. https://doi.org/10.1007/s12115-021-00586-8

Ramírez-Alpízar A, Jenkins M, Martínez A, Quesada-López C (2020a) Use of data mining and machine learning techniques for fraud detection in financial statements: a systematic mapping study. Rev Ibér Sist Tecnol Inf Lousada No. E28:97–109

Reurink A (2018) Financial fraud: a literature review. J Econ Surv 32(5):1292–1325. https://doi.org/10.1111/joes.12294

Rocha-Salazar J-J, Segovia-Vargas M-J, Camacho-Miñano M-M (2021) Money laundering and terrorism financing detection using neural networks and an abnormality indicator. Expert Syst Appl 169:114470. https://doi.org/10.1016/j.eswa.2020.114470

Roehrs A, da Costa CA, Righi R, da R, de Oliveira KSF (2017) Personal health records: a systematic literature review. J Med Internet Res 19(1):e13. https://doi.org/10.2196/jmir.5876

Rubio J, Barucca P, Gage G, Arroyo J, Morales-Resendiz R (2020) Classifying payment patterns with artificial neural networks: an autoencoder approach. Lat Am J Cent Bank 1(1–4):100013. https://doi.org/10.1016/j.latcb.2020.100013

Sahin Y, Bulkan S, Duman E (2013) A cost-sensitive decision tree approach for fraud detection. Expert Syst Appl 40(15):5916–5923. https://doi.org/10.1016/j.eswa.2013.05.021

Saputra M, Santosa PI, Permanasari AE (2023) Consumer behaviour and acceptance in fintech adoption: a systematic literature review. Acta Inform Pragensia 12(2):468–489. https://doi.org/10.18267/j.aip.222

Saragih MG, Chin J, Setyawasih R, Nguyen PT, Shankar K (2019) Machine learning methods for analysis fraud credit card transaction. Int J Eng Adv Technol 8(6S):870–874. https://doi.org/10.35940/ijeat.F1164.0886S19

Sathya M, Balakumar B (2022) Insurance fraud detection using novel machine learning technique. Int J Intell Syst Appl Eng 10(3):374–381

Savić M, Atanasijević J, Jakovetić D, Krejić N (2022) Tax evasion risk management using a hybrid unsupervised outlier detection method. Expert Syst Appl 193:116409. https://doi.org/10.1016/j.eswa.2021.116409

Seera M, Lim CP, Kumar A, Dhamotharan L, Tan KH (2021) An intelligent payment card fraud detection system. Ann Oper Res. https://doi.org/10.1007/s10479-021-04149-2

Shahana T, Lavanya V, Bhat AR (2023) State of the art in financial statement fraud detection: a systematic review. Technol Forecast Soc Change 192:122527. https://doi.org/10.1016/j.techfore.2023.122527

Shou M, Bao X, Yu J (2023) An optimal weighted machine learning model for detecting financial fraud. Appl Econ Lett 30(4):410–415. https://doi.org/10.1080/13504851.2021.1989367

Singh A, Jain A, Biable SE (2022) Financial fraud detection approach based on firefly optimization algorithm and support vector machine. Appl Comput Intell Soft Comput 2022:1–10. https://doi.org/10.1155/2022/1468015

Smith Q-J, Valverde R (2021) A perceptron based neural network data analytics architecture for the detection of fraud in credit card transactions in financial legacy systems. WSEAS Trans Syst Control 16:358–374. https://doi.org/10.37394/23203.2021.16.31

Sofy MA, Khafagy MH, Badry RM (2023) An intelligent Arabic model for recruitment fraud detection using machine learning. J Adv Informat Technol. https://doi.org/10.12720/jait.14.1.102-111

Srokosz M, Bobyk A, Ksiezopolski B, Wydra M (2023) Machine-learning-based scoring system for antifraud CISIRTs in banking environment. Electronics 12(1):251. https://doi.org/10.3390/electronics12010251

Subudhi S, Panigrahi S (2020) Use of optimized fuzzy C -Means clustering and supervised classifiers for automobile insurance fraud detection. J King Saud Univ— Comput Inf Sci 32(5):568–575. https://doi.org/10.1016/j.jksuci.2017.09.010

Ti Y-W, Hsin Y-Y, Dai T-S, Huang M-C, Liu L-C (2022) Feature generation and contribution comparison for electronic fraud detection. Sci Rep 12(1):18042. https://doi.org/10.1038/s41598-022-22130-2

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Tingfei H, Guangquan C, Kuihua H (2020) Using variational auto encoding in credit card fraud detection. IEEE Access 8:149841–149853. https://doi.org/10.1109/ACCESS.2020.3015600

Torrano C, Recuero P, Ramirez F, Hernández S, Torres J (2018) Machine learning aplicado a la ciberseguridad: técnicas y ejemplos en detección de amenazas. Zeroxword Computing

Udeze CL, Eteng IE, Ibor AE (2022) Application of machine learning and resampling techniques to credit card fraud detection. J Niger Soc Phys Sci 769. https://doi.org/10.46481/jnsps.2022.769

Usman A, Naveed N, Munawar S (2023) Intelligent anti-money laundering fraud control using graph-based machine learning model for the financial domain. J Cases Inf Technol 25(1):1–20. https://doi.org/10.4018/JCIT.316665

Van Capelleveen G, Poel M, Mueller RM, Thornton D, Van Hillegersberg J (2016) Outlier detection in healthcare fraud: a case study in the Medicaid dental domain. Int J Account Inf Syst 21:18–31. https://doi.org/10.1016/j.accinf.2016.04.001

Vanhoeyveld J, Martens D, Peeters B (2020) Value-added tax fraud detection with scalable anomaly detection techniques. Appl Soft Comput 86:105895. https://doi.org/10.1016/j.asoc.2019.105895

Vanini P, Rossi S, Zvizdic E, Domenig T (2023) Online payment fraud: from anomaly detection to risk management. Financ Innov 9(1):66. https://doi.org/10.1186/s40854-023-00470-w

Vanneschi L, Horn DM, Castelli M, Popovič A (2018) An artificial intelligence system for predicting customer default in e-commerce. Expert Syst Appl 104:1–21. https://doi.org/10.1016/j.eswa.2018.03.025

Viera J, Aguilar J, Rodríguez-Moreno M, Quintero-Gull C (2023) Analysis of the behavior pattern of energy consumption through online clustering techniques. Energies 16(4):1649. https://doi.org/10.3390/en16041649

Wadhwa VK, Saini AK, Kumar SS (2020) Financial fraud prediction models: a review of research evidence. Int J Sci Technol Res 9(1):677–680

West J, Bhattacharya M (2016) Intelligent financial fraud detection: a comprehensive review. Comput Secur 57:47–66. https://doi.org/10.1016/j.cose.2015.09.005

Whiting DG, Hansen JV, McDonald JB, Albrecht C, Albrecht WS (2012) Machine learning methods for detecting patterns of management fraud. Comput Intell 28(4):505–527. https://doi.org/10.1111/j.1467-8640.2012.00425.x

Article   MathSciNet   Google Scholar  

Wohlin C (2014) Guidelines for snowballing in systematic literature studies and a replication in software engineering. In: Proceedings of the 18th international conference on evaluation and assessment in software engineering. pp. 1–10

Wu B, Lv X, Alghamdi A, Abosaq H, Alrizq M (2023) Advancement of management information system for discovering fraud in master card based intelligent supervised machine learning and deep learning during SARS-CoV2. Inf Process Manag 60(2):103231. https://doi.org/10.1016/j.ipm.2022.103231

Article   PubMed   Google Scholar  

Xiong T, Ma Z, Li Z, Dai J (2022) The analysis of influence mechanism for internet financial fraud identification and user behavior based on machine learning approaches. Int J Syst Assur Eng Manag 13(S3):996–1007. https://doi.org/10.1007/s13198-021-01181-0

Xiuguo W, Shengyong D (2022) An analysis on financial statement fraud detection for Chinese listed companies using deep learning. IEEE Access 10:22516–22532. https://doi.org/10.1109/ACCESS.2022.3153478

Yeh I-C (2016) Default of credit card clients. UCI Machine Learning Repository. https://doi.org/10.24432/C55S3H

Zhang Z, Zhou X, Zhang X, Wang L, Wang P (2018) A model based on convolutional neural network for online transaction fraud detection. Secur Commun. Netw. 2018:1–9. https://doi.org/10.1155/2018/5680264

Zhao Z, Bai T (2022) Financial fraud detection and prediction in listed companies using SMOTE and machine learning algorithms. Entropy 24(8):1157. https://doi.org/10.3390/e24081157

Zhou H, Chai H, Qiu M (2018) Fraud detection within bankcard enrollment on mobile device based payment using machine learning. Front Inf Technol Electron Eng 19(12):1537–1545. https://doi.org/10.1631/FITEE.1800580

Zupan M, Budimir V, Letinic S (2020) Journal entry anomaly detection model. Intell Syst Account Financ Manag 27(4):197–209. https://doi.org/10.1002/isaf.1485

Download references

Acknowledgements

We would like to express our gratitude to the Universidad Cooperativa de Colombia, Ibagué campus, Espinal. This research work was supported by Universidad Cooperativa de Colombia and derived from research project INV3456 entitled “Detection of anomalies in financial data in social economy organizations through machine learning techniques” associated with the PLANAUDI, AQUA and SINERGIA UCC group, from the Research Center of the Public Accounting and Systems Engineering program of the UCC Ibagué campus.

Author information

Authors and affiliations.

School of Public Accounting, Universidad Cooperativa de Colombia, 730001, Ibagué-Espinal campus, Ibagué, Colombia

Ludivia Hernandez Aros & John Johver Moreno Hernandez

School of Systems Engineering, Universidad Cooperativa de Colombia, 730001, Ibagué-Espinal campus, Ibagué, Colombia

Luisa Ximena Bustamante Molano & Fernando Gutierrez-Portela

School of Business Administration, Universidad Cooperativa de Colombia, 730001, Ibagué-Espinal campus, Ibagué, Colombia

Mario Samuel Rodríguez Barrero

You can also search for this author in PubMed   Google Scholar

Contributions

All authors contributed to the creation and design of the study.

Corresponding author

Correspondence to Ludivia Hernandez Aros .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Ethical approval and consent to participate

The authors declare that they have no human participants, human data, or human tissue.

Consent to publish

The authors have no data from any individual person on any form.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Hernandez Aros, L., Bustamante Molano, L.X., Gutierrez-Portela, F. et al. Financial fraud detection through the application of machine learning techniques: a literature review. Humanit Soc Sci Commun 11 , 1130 (2024). https://doi.org/10.1057/s41599-024-03606-0

Download citation

Received : 15 November 2023

Accepted : 13 August 2024

Published : 03 September 2024

DOI : https://doi.org/10.1057/s41599-024-03606-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

literature review of financial analysis statement

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

Does managerial myopia promote enterprises over-financialization? Evidence from listed firms in China

Roles Data curation, Formal analysis, Funding acquisition, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

Affiliation Institute of Food and Strategic Reserves, Nanjing University of Finance and Economics, Nanjing, Jiangsu, China

Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Methodology, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

ORCID logo

Roles Data curation, Funding acquisition, Project administration, Visualization, Writing – original draft, Writing – review & editing

Affiliation Economic School, Nanjing University of Finance and Economics, Nanjing, Jiangsu, China

PLOS

Fig 1

This paper analyzes the potential shortsightedness of enterprise managers through annual reports. Additionally, we use corporate financial statement data to measure enterprises over-financialization in terms of resource allocation. After testing with a causal inference model, we find that firms with managerial myopia significantly contribute to over-financialization. It remains robust even after the instrumental variable of whether the manager has experienced a famine is used. Furthermore, financial distress and financing constraints amplify the inclination of short-term-focused managers to amass greater financial assets.

Citation: Chen Y, Ye J, Shi Q (2024) Does managerial myopia promote enterprises over-financialization? Evidence from listed firms in China. PLoS ONE 19(9): e0309140. https://doi.org/10.1371/journal.pone.0309140

Editor: Wajid Khan, University of Baltistan, PAKISTAN

Received: February 25, 2024; Accepted: August 7, 2024; Published: September 5, 2024

Copyright: © 2024 Chen et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data that support the findings of this study are available in http://doi.org/10.57760/sciencedb.10232 , reference number https://www.scidb.cn/en/s/eEFzeq .

Funding: This work was supported by the [National Social Science Fund of China] under Grant [22VRC007]; [Institute of Food and Strategic Reserves, Nanjing University of Finance and Economics] under Grant [ BSZX2023-07]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Economic financialization has emerged as a significant driver behind the stagnation of economic growth and the decline in productivity [ 1 ]. This phenomenon is predominantly attributed to the escalating adoption of financialized practices among corporate entities, resulting in a squeeze on genuine investments and hindering their long-term growth trajectories [ 1 , 2 ]. Moreover, the predominance of corporate financialization has impeded the overall development of the macroeconomy, exacerbating economic operational risks and restructuring of industrial sectors. Consequently, scholarly attention has been increasingly focused on elucidating the motivations driving corporate financialization to forestall the broader economic implications [ 3 , 4 ]. However, on the contrary, corporate financialization is also propelled by motives of “reserving” and “profit-seeking”, primarily aimed at mitigating financing constraints and engaging in speculative arbitrage. This not only aids in reducing the risks of funding interruption but also facilitates short-term high returns [ 5 , 6 ], thereby favoring corporate development. Consequently, this study advocates for heightened scrutiny of corporate over-financialization, with managerial myopia identified as one of the primary catalysts.

It is crucial to clarify that the financialization behavior of corporations is closely associated with their unique financial conditions and operational performance, with managerial myopia being a key factor in causing fund squeezing and hindering long-term growth. Specifically, corporate financialization is a multifaceted concept, the economic outcomes of which require nuanced investigation. Taking high- and low-performance companies that hold financial assets as an example, high-performing enterprises adeptly address their investment needs, resulting in surplus idle funds, whereas low-performing enterprises exhibit more investment substitution. However, when corporations are led by myopic managers, they tend to prefer short-term, rapid investment projects [ 7 ], making short-term profit decisions at the expense of long-term interests [ 8 ], which evidently exacerbates the squeezing-oriented profit-seeking behavior of corporations. Nonetheless, existing research has tended to overemphasize the negative impacts of corporate financialization by exploring the motives and economic effects of financialization behavior from a homogenous perspective, thus neglecting its potential benefits [ 9 , 10 ].

This paper wants to emphasize that enterprise financialization behavior is not equivalent to managerial myopia. Financialization is only a “double-edged sword” investment decision-making behavior motivated by resource allocation. Therefore, the financialization behavior of enterprises may not necessarily harm the long-term sustainable development of enterprises. Thus, this paper takes Chinese non-financial industry listed companies from 2005 to 2022 as samples, quantifying managerial myopia through the analysis of annual report texts; matches with the China Stock Market and Accounting Research Database (CSMAR) to obtain financial information for each company and identify indicators of corporate excessive financialization. Finally, using a mixed-effects OLS model and logistic model, it explores the impact of managerial myopia on corporate excessive financialization from the perspective of corporate governance. Empirical results indicate that Chinese non-financial listed companies generally engage in financial investment behavior. It is noteworthy that a considerable number of companies are found to exhibit a tendency towards excessive financialization, with the impact of managerial myopia becoming a key factor contributing to this phenomenon. Even after addressing potential endogeneity issues, we find the results to be robust. Additionally, the study also finds that the presence of internal financing constraints and financial distress exacerbates the tendency of myopic managers to engage in higher financial investment to smooth short-term benefits, while increasing the risk of excessive financialization.

The marginal contributions of this study are as follows: Firstly, we are among the first to construct an indicator of excessive financialization based on the perspective of corporate financial heterogeneity. This enriches research on corporate financialization, helping scholars to correctly grasp the dual nature of financialization and avoid its negative effects. Secondly, we consider the differences between managerial myopia and corporate financialization, empirically testing the phenomenon of excessive financialization caused by managerial myopia, helping shareholders better understand the underlying motives of financialization and achieve a balance between short-term and long-term interests. Finally, the study also considers, through moderation effect models, the impact of managerial myopia on corporate excessive financialization under external conditions such as financing constraints and financial distress, providing a theoretical basis for better corporate development.

The subsequent sections of this paper are organized as follows. Firstly, the “Literature Review” section provides a comprehensive overview of existing research on this topic. Secondly, the “Research Methodology” section delineates the model specifications and outlines the sample selection process applied in the empirical analysis. Following this, the “Empirical Analysis” section presents the findings obtained from the analysis and rigorously examines their robustness. Lastly, the “Conclusion” section summarizes and concludes the paper.

Literature review

The relevant literature on the impact of managerial myopia on excessive corporate financialization mainly focuses on the effects of corporate financialization and its underlying logic, as well as the measurement of managerial myopia and influencing factors. Concerns about the impact of corporate financialization are mainly concentrated in the macroeconomic domain among scholars. Epstein [ 11 ] defines financialization as the escalating influence of financial motives, markets, actors, and institutions on both domestic and international economies. Krippner [ 12 ], in contrast, characterizes financialization as a mode of accumulation wherein profits predominantly accrue through financial channels, rather than through conventional trade and commodity production. Unquestionably, the past four decades have witnessed a swift, substantial surge in financialization in the United States and globally, as defined by the above conceptualizations. This escalating trade of financialization has not gone unnoticed by researchers, who have unveiled its detrimental impact on overall economic growth. The proliferation of debt-based financial networks has compounded existing economic and social disparities [ 4 , 13 ]. Similarly, Hein [ 14 ] contends that financialization is a primary driver of modern capitalist stagnation, intensifying the escalation of global macroeconomic risks. The phenomenon of macro-financialization embodies a concentrated expression of the pervasive micro-financialization, prompting scholars to delve into the behavioral rationale for corporate financialization and to reveal that the financialization of nonfinancial firms curtails tangible investment, thereby acting as a primary catalyst for sluggish economic growth and reduced productivity [ 1 ].

Subsequently, scholars have delved into analyzing the underlying behavioral logic of corporate financialization to achieve a reduction in corporate financialization [ 15 , 16 ]. Entity firms engage in financial assets investment owing to a reserve motive or a profit-seeking motive [ 5 , 6 ]. The reserve motive, rooted in Keynes’ perspective [ 17 ], posits that maintaining liquid, realizable assets aids in alleviating funding constraints. Conversely, the profit-seeking motive arises from the emergence of financial markets, enabling firms to capitalize on carry and arbitrage trading opportunities driven by domestic currency appreciation [ 18 ]. Orhangazi [ 2 ] underscores that a manager may squeeze out the amount of their own fixed-asset investments owing the lure of financialized arbitrage, compromising long-term interests for short-term gains. In China, the real estate industry, which has a high degree of financialization, also faces the negative impact of debt risk shifting to banks, which is worthy of social and government vigilance [ 19 ]. Davis [ 20 ] presents opposing evidence, suggesting that financialization can stimulate fixed investment. Furthermore, corporate finance exhibits externalities, promoting innovation levels by alleviating financing constraints for other enterprises [ 21 ]. Therefore, societal concerns regarding corporate financialization primarily stem from its “profit-seeking” effects, rather than the “reserve” motives. Some scholars persist in using the financial asset ratio as an indicator of financialization, which may neglect the potential benefits of financialization and overly emphasize its drawbacks [ 10 ]. Effective resource planning and management are important pathways for enterprises to achieve their expected goals [ 22 ]. On the contrary, Song & Wu [ 23 ] and Wang et al. [ 24 ] contend that financialization involves being willing to assume more financial risks when facing operational and financial challenges. Over-financialization is deemed to occur when these risks surpass expectations, offering valuable insights for refining the theoretical boundaries of financialization and optimizing corporate governance practices.

Research on managerial myopia tends to focus on measurement methods, which can effectively reflect managers’ behavioral tendencies. Shortsighted behavior exists in various fields, such as managerial shortsightedness, market shortsightedness, and investment shortsightedness [ 25 – 27 ]. Among them, the behavioral dimensions of managers have been extensively examined, with scholars contending that managers wield significant influence in strategy formulation and decision-making [ 28 – 30 ]. As the helmsman of the enterprise, managers play an important role in the strategic formulation and decision-making of the enterprise [ 29 ]. To help the corporate shareholders and sectors of society recognize the shortsighted behavior of managers, some scholars have captured and managed shortsighted behavior by analyzing the word types and word frequencies used in the language of the experimental subjects [ 25 , 31 ]. In one work, the shortsighted behavior is captured by analyzing the word types and word frequencies used in the language of the experimental subjects. The reason the text can be used to describe the characteristics of managers is as follows: First, text can effectively capture the characteristics of people. For example, the more emphasis there is on “past”, “once”, and similar words in a person’s language, the more attention they pay to the past; the more emphasis there is on words such as “future”, “possible”, and “to go”, the more they pay attention to the future [ 32 ]. Second, the characteristics of managers greatly affect the characteristics of corporate information disclosure [ 33 ]. Management discussion and analysis (MD&A), as a manager’s review of the business situation during the reporting period, as well as an exposition of the opportunities, challenges, and risks faced by the next year’s business plan and the future development of the company, can directly show the characteristics of managers. Li [ 34 ] reports that it is reliable to depict managers’ traits through texts such as MD&A.

Regarding the impact of managerial myopia on corporate financialization, it can clearly be understood that managerial myopia is characterized by managers prioritizing short-term profit decisions at the potential cost of the company’s long-term interests [ 8 ]. From the perspective of economic motivation, management, out of consideration for their own position, salary, and reputation, may use information asymmetry to choose some short-term investment schemes that can quickly generate returns, rather than making strategic decisions from the perspective of long-term best interests. For example, Gopalan et al. [ 35 ] reports that the shorter the average execution period of the management compensation contract, the easier it is for the management to make shortsighted behavior. Bolton [ 36 ] finds that shortsighted managers may sacrifice the long-term interests of enterprises to obtain excess compensation brought about by stock price fluctuations. Graham et al. [ 37 ] reveals that to obtain stable income and maintain their reputation, management may adopt some hidden profit manipulation methods and even make activities that sacrifice the long-term value of the enterprise. From the perspective of external pressure, the investment preference of short-term institutional investors [ 38 ], analyst tracking [ 25 ], and the number of financial report disclosures [ 39 ] may affect managerial shortsightedness. In this context, financialization emerges as a novel form of surplus management and a means of adjusting book profits, appealing to shortsighted managers and consequently intensifying the risk of over-financialization [ 40 ]. Although managers tend to prioritize growth and shareholders emphasize profits, the performance of financialization aligns with the shared preferences of both shortsighted managers and shareholders. This convergence of interests in the realm of financialization has led to interconnection of corporate financialization and managerial myopia [ 41 ].

Hence, a fundamental contradiction between micro-level enterprises and macroeconomic financialization is the widespread financialization of enterprises, in which the degree of financialization of enterprises does not match the level of their own resource management, resulting in the phenomenon of the crowding-out of enterprises’ investment in fixed assets from “real to virtual”. It must be admitted that financialization represents a kind of financial investment behavior driven by enterprise resource management, and enterprise financialization is also conducive to enterprise development to a certain extent. Therefore, compared with the financialization behavior caused by managers’ short-sightedness, the study believes that it is more important to prevent the excessive financialization behavior caused by managers’ short-sightedness. The main mechanism are shown in Fig 1 , where corporate financialization decisions mainly influenced by the “reserve motives” and “profit motives”. Among them, profit-motivated enterprises are subdivided into “investment substitution” and “investing surplus” types. The excessive financialization measures taken by “investment substitution” enterprises formed under managerial myopia will cause “from real to virtual”. While “investing surplus” will take moderate financialization measures to realize the profit maximization.

thumbnail

https://doi.org/10.1371/journal.pone.0309140.g001

Research methodology

Operational definitions of research variables, over-financialization..

Certainly, clarifying the degree of corporate financialization and delineating the “moderation” boundary are foundational steps in the research process. This paper refers to the definition of Demir [ 42 ] to employ the ratio of financial assets to total assets as an index to measure financialization degree, where financial assets mainly include trading financial assets, sustainable sale financial assets, held-to-maturity investments, loans and advances, derivative financial instruments, long-term equity investments, and investment properties.

literature review of financial analysis statement

Though Eq ( 1 ), the optimal financialization level of each entity enterprise can be fitted. ε , which is the degree of over-financialization ExFin in this research, is the distance between the actual financialization degree of the enterprise and the optimal. The positive residual indicates over-financialization of an enterprise, and the negative means that the firm is financialized within a moderate range. The larger residual the residual value is, the more likely the enterprise is over-financialized. In addition, we construct an indicator, IExFin , of whether there is over-financialization, based on the optimal financialization level. When ExFin >0, the enterprise has over-financialization, and IExFin is assigned to 1; when IExFin indicates that the enterprise finance is within a moderate range, IExFin is assigned to 0.

Managerial myopia.

In the field of managerial myopia, the method of capturing managerial strategic information through word frequency text has become relatively mature. Brochet et al. [ 25 ] report on the word frequency ratio of “time-domain” vocabulary to effectively capture managerial myopia. Similarly, Hu et al. [ 31 ] build a comprehensive set of word frequency statistics tailored to the specific context of managerial myopia in Chinese enterprises owing to the nuances between Chinese and English dictionaries, and then they curate a selection of 43 words under the “shortsighted domain” category as managerial myopia indicators. This sophisticated dictionary-based method enables a nuanced understanding of managerial myopia within the Chinese business landscape.

The specific operation steps of measuring managerial myopia are as follows: (1) Convert to TXT (text) format: The managerial myopia index is converted into a TXT format utilizing Python based on Portable Document Format (PDF) annual reports of A-share listed companies in Shanghai and Shenzhen; (2) Extract the MD&A chapters, which often encapsulate critical insights into managerial strategies, in the annual financial reports; (3) Perform word frequency partitioning: The Jieba toolkit, a versatile Chinese language processing tool in Python, is employed to conduct word frequency partitioning on the annual report text of the listed companies. Then, the word frequency of “myopia field” is counted after filtering out irrelevant or deactivated words; (4) Calculate the total word frequency of MD&A chapters based on Python’s Jieba toolkit; (5) Use the word frequency of “myopia field” as a percentage of the total frequency of MD&A and multiply it by 100 to obtain the indicator of managerial myopia index ( Myopia ). The larger the Myopia value, the more myopic the manager.

Control variables

Financialization is the investment decision of enterprises based on resource endowment, which means that the current level of financialization reflects the ongoing enterprises’ adjustments in resource allocation. In addition to investigating managerial myopia, we incorporate several control variables to perform a comprehensive analysis. These variables include 1) firm size: represented as the natural logarithm of total assets at the year-end; 2) firm age: expressed as the natural logarithm of the firm’s operating years; 3) firm growth opportunity: measured by the growth rate of the year-end operating income; 4) firm debt ratio: defined as the ratio of total liabilities to total assets at the year-end; 5) firm liquidity: calculated as the ratio of the company’s monetary assets to total assets at the year-end; 6) firm net interest rate: captured as the natural logarithm of the firm’s net profit at the year-end. These variables collectively shed light on various aspects of the enterprise’s operations, contributing to a more nuanced understanding of how managerial decision-making and financialization interplay within the broader business context.

Taking into account the multifaceted nature of capturing managerial myopia within corporate annual reports and the collaborative nature of determining firms’ financialization levels involving managers and shareholders, we incorporate a range of variables to ensure a comprehensive analysis. These variables, which serve as control measures, account for various dimensions of decision-making and governance within the enterprise: 1) equity market value: represented by the natural logarithm of the total market value of the enterprise’s stock market, capturing the market perception and valuation of the company’s worth; 2) Director-Cum-CEO: a binary indicator variable, taking the value of 1 when the CEO concurrently holds the position of the chairman of the board of directors and 0 otherwise. This variable accounts for the potential concentration of decision-making power; 3) proportion of independent directors: calculated as the ratio of independent directors to the total number of directors, reflecting the extent to which external perspectives influence governance; 4) ownership concentration: represented as the proportion of the top 10 shareholders’ collective ownership in the company’s total shares. Moreover, firm-specific fixed effects and time-specific fixed effects are introduced to mitigate the impact of unobserved or omitted variables that can potentially confound the results.

Regression model of research

We employ an econometric model to test the causal relationship between managerial myopia and enterprise over-financialization. The research delves into two components of managerial myopia: its impact on the degree of over-financialization and its potential role in causing over-financialization. Importantly, the study distinguishes between the continuous variable representing the degree of over-financialization transformation and the binary variable indicating the presence or absence of over-financialization. As a result, different model settings are applied for these two components to effectively capture the nuances of the relationship.

literature review of financial analysis statement

Sample selection

We focus on a comprehensive sample of listed companies in China from 2005 to 2022. To ensure the robustness and accuracy of the empirical analysis, we treat the sample as follows: (1) textual analysis inclusion: The research primarily employs textual analysis to capture managerial myopia characteristics, eliminating unpublished or discontinuous annual reports to ensure the completeness of the explanatory variable data; (2) financialization focus: The research specifically investigates the financialization behavior of real enterprises, excluding the samples of financial firms such as banks, securities, insurance and trusts; (3) exclusion of specific types: Companies categorized as *ST, ST and PT types are excluded from the sample to avoid outliers in the sample; (4) Missing variable removal: Samples with missing relevant variables in the relevant financial statements are removed from the analysis; (5) asset loading ratio threshold: Samples with asset loading ratios exceeding 100% and exclude financially abnormal samples are eliminated from the sample; (6) winsorization: To mitigate the impact of extreme values, we winsorize all continuous variables at the 1% and 99% levels. Following these rigorous criteria, the final total sample comprises 14,870 observations for unbalanced panel data. Table 1 reports the final sample distribution by industry, where Code J is the missing identifier for the financial industry in the Guidelines for Industry Classification of Listed Companies. Sample firms are mainly in the manufacturing sector, consistent with the industry distribution of listed companies in China, and the total number of Code C samples is 10,584 observations, accounting for 71.18%. The data regarding managerial myopia are extracted from the annual reports of A-share listed companies in the Shanghai and Shenzhen stock markets, and the Python program is employed to perform web crawling and compile relevant word frequency statistics. Additional variable data are derived from the CSMAR database, ensuring a comprehensive, robust dataset for the empirical analysis.

thumbnail

https://doi.org/10.1371/journal.pone.0309140.t001

Empirical analysis

Descriptive statistics..

Descriptive statistics of research variables are shown in Table 2 , focusing on the selected listed entity firms in China from 2005 to 2022, yielding a total of 14,870 observations after processing. Table 1 reports the descriptive statistics of all variables. It can be found that the degree of Chinese enterprises’ over-financialization has a normal distribution, with the mean value around 0. The range of values spans from a minimum of −31.01 to a maximum of 55.64. At the level of whether or not over-financialization exists, a considerable portion of enterprise samples have over-financialization in the sample as a whole, with the mean value being 0.34, and more than 60% of enterprises being in a rational financialization state. In the descriptive statistics of the core independent variables, Chinese enterprises have different degrees of managerial myopia. The proportion of “managerial myopia” within the total word frequency of MD&A in the annual reports is within in the range of [0.01%, 1.96%], and the average value of manager myopia is observed at approximately 0.23% across the entire sample.

thumbnail

https://doi.org/10.1371/journal.pone.0309140.t002

At the level of enterprise resource control, the asset sizes of Chinese listed entity enterprises do not differ much after eliminating the enterprises with abnormal debt ratio. Most enterprises maintain a high level of business growth, with corporate liquidity and net profit showing a normal distribution. At the management control level, the average value of director-cum-CEO in Chinese enterprises is 0.26, indicating that a quarter of the board of directors in the sample directly manages the company. The mean proportion of independent directors in the board of directors is 37.10%, and the vast majority of enterprises comply with at least one-third of the provisions of the independent director system for independent directors. The average concentration of the top 10 shareholders of each sample is 10%, but substantial variation exists among enterprises, with the largest company demonstrating an equity concentration of as high as 64.29%.

Tests related to the classical hypothesis of regression

Table 3 presents the outcomes of the model tests. In this section, the variance inflation factor (VIF) test, F-Limer test, and Hausman test are performed on the fixed-effects model and the logistics model. The results show no multicollinearity exists between the variables, as evidenced by a VIF value of 1.3. This implies that the examined variables are not highly correlated, thereby supporting the reliability of the model. The significance levels of the F-Limer test for the two models are both less than 5%, which indicates that the panel data model is valid and accepted. The Hausman test also yields significance levels below 5%, and the fixed-effects model setting is deemed suitable and accepted.

thumbnail

https://doi.org/10.1371/journal.pone.0309140.t003

Result of the research hypothesis test

The regression results of the effect of managerial myopia on corporate over-financialization are shown in Table 4 . Column 1 shows the linear regression results of managerial myopia on the degree of corporate over-financialization, and Column (2) presents whether managerial myopia triggers corporate over-financialization. Column (1) shows that the coefficient of Myopia to ExFin is positive and significant (p<0.05). This means that the degree of corporate over-financialization increases by 11.24% (0.7424 x 0.1514) for every unit of standard deviation increase in managerial myopia. Moreover, the column (2) results do not show the constant term with R-squared owing to the fixed panel logistic model used in the test. After 1,442 groups (5,750 obs) of firm samples are eliminated in the panel logistic regression because these groups have either positive or negative outcomes, the coefficient of Myopia to ExFin remains positive and significant (p<0.05), and the probability of over-financialization increases by 7.01% (0.4633 x 0.1514) when managerial myopia increases by one standard deviation. The results indicate that managerial myopia manifests in short-termism, leading to increased financial asset allocation beyond the reasonable level of resource allocation for enterprises, which is detrimental to long-term development.

thumbnail

https://doi.org/10.1371/journal.pone.0309140.t004

Robustness check

To enhance the robustness of the empirical results, we conduct various sensitivity analyses involving adjustments to the sample interval, the replacement of variables, and endogeneity testing. The outcomes of these analyses are presented in Table 5 (adjusting sample interval and replacement variables) and Table 6 (endogeneity test results).

thumbnail

https://doi.org/10.1371/journal.pone.0309140.t005

thumbnail

https://doi.org/10.1371/journal.pone.0309140.t006

Eliminating the impact of the financial crisis

The root causes of the 2008 global financial crisis stemmed from improper real estate financial policies and the misuse of financial derivatives, which clearly fueled excessive financialization behavior in enterprises. After the financial crisis of 2008, some enterprises learned from experience, leading to a moderation in the situation of excessive financialization. Hence, this part excludes the impact of the 2008 global financial crisis and adjusts the study sample to the period from 2009 to 2022. Columns (1) and (2) of Table 5 are the effect managerial myopia on the degree of over-financialization and whether over-financialization occurs. The results show that the coefficient of Myopia is positively significant (p<0.10). The results indicate that even after excluding the influence of financial crises, managerial myopia still promotes excessive financialization in enterprises. The conclusion is consistent with the previous section, meaning that the conclusions are robust.

Replacement of variable measure

The substitution of explanatory variables and dependent variables is a commonly used method in robustness testing, aiming to examine whether the causal relationship between variables still holds. The degree of financialization, Fin , is used to replace the over-financialization index in this part, and the results are shown in column of Table 5 . The coefficient of Myopia remains positive and significant (p<0.10). The conclusion that managerial myopia increases financial asset allocation is consistent with the findings of [ 44 ]. Differences exist in the explanatory significance between the two. Managerial myopia increases the degree of enterprises’ financialization, while the increase in the degree of financialization may be conducive to the optimal financialization of firms. Subsequently, this part uses the stock turnover rate to replace the indicator of managerial myopia. This is because managers largely adopt short-term behaviors to improve market valuation to cater to investors, and these short-term behaviors increase the stock turnover rate of enterprises. The result that the coefficient of ART is positively significant (p<0.05) further confirms this idea.

Endogeneity test

Causal inference should exclude bidirectional causality between independent and dependent variables, meaning that short-term gains from corporate financialization may lead to managerial overconfidence, making them more myopic. Therefore, the paper chose the instrumental variable approach to deal with endogeneity, and different instrumental variable (IV) analysis models are selected based on distinction between the over-financialization degree and the indicators of whether over-financialization is present. The traditional two-stage least squares (2SLS) test is employed for the endogenous test of the former over-financialization degree. Since no instrumental variable method model exists for fixed Logit, the alternative model test (IV-Probit model test) is used for the latter index.

For the selection of instrumental variables, we choose whether the manager has experienced famine as an instrumental variable. China refers to 1959–1961 as the “Three-year Difficult Period” or “Three-year Natural Disasters.” During this period, China’s farmlands suffered from large-scale natural disasters for several years in a row, facing a nationwide food shortage crisis with about 2.5 million deaths due to starvation. Managers’ early life experiences tend to influence their corporate decision-making. When managers have experienced famine in their early years, they tend to be conservative in decision-making on whether to over-financialize businesses, and they set aside part of the capital to cope with the “famine.” Therefore, the managerial age of entrepreneurs born before 1959 is set to 1 in this research and set to 0 for those born after 1959.

In summary, the results of the endogeneity tests are shown comprehensively in Table 6 . Columns (1) and (3) are the first-stage results of the two instrumental variables methods, while Columns (2) and (4) are the second-stage results. The findings indicate that the instrumental variables used in the analysis exhibit positive and statistically significant (p<0.01). Moreover, the coefficient of managerial myopia remains positively significant in the second stage, reinforcing the robustness of the observed relationship. The Anderson canon. corr. Lagrange Multiplicator (LM) statistic and Cragg–Donald Wald F (joint hypotheses) statistic also pass the test in the research, suggesting that the instrumental variables do not face issues of overidentification or weak instrumental variables.

Moderating role of financial constraints and financial distress

The subsequent subsection delves into the potential “reservoir effect” of corporate financialization in the context of mitigating financial risks. Specifically, this subsection mainly examines whether the presence of financial risks amplifies the likelihood of over-financialization. The research mainly classifies financial risks into two categories: financing constraints and financial distress. Financing constraints entail challenges faced by enterprises in raising external funds, while financial distress denotes a financial crisis that can disrupt the capital turnover process. This paper employs the size and age index (SA index) and zeta score (Z-Score) to quantify these two risks. The results are shown in Table 7 .

thumbnail

https://doi.org/10.1371/journal.pone.0309140.t007

Columns (1) and (3) present the outcomes of assessing the moderating influence of the two risk variables on the relationship between managerial myopia and the degree of over-financialization. Columns (2) and (4) are the moderating role of the two variables in the influence of managerial myopia on the degree of whether over-financialization. The results reveal that the coefficient SA × Myopia is positive in both Columns (1) and (2), but that it is statistically significant only in Columns (1) (p<0.05). Similarly, the coefficient Zfin × Myopia is positive in both Columns (3) and (4), with statistical significance observed only in Columns (3) (p<0.10). These findings suggest that an increase in financial risk, as indicated by the SA index and Z-Score, intensifies the degree of financialization driven by managerial myopia. However, this heightened financial risk does not necessarily trigger over-financialization among enterprises.

This paper has constructed the optimal financialization level index of enterprises based on the sample of nonfinancial listed companies from 2005 to 2022 in China, and has empirically analyzed the impact of managerial myopia on over-financialization of firms on this index basis. The findings reveal a prevalent trend of financialization in China’s real enterprises. While a significant number of firms demonstrate behaviors of over-financialization, the majority of enterprises fall within a moderate range of financialization practices. From the perspective of managers in corporate governance, managerial myopia favors financialization to obtain short-term benefits and triggers over-financialization behaviors, which are detrimental to firms’ long-term interests. Under financial distress and financing constraints, such behavior exacerbates the shortsightedness of managers in increasing their holdings of financial assets to make quick short-term gains to tide over difficulties. However, this behavior does not necessarily result in over-financialization, suggesting that financialization under such circumstances may be a strategic response, rather than a cause of over-financialization.

Discussion and suggestions

This paper has introduced a valuable distinction between over-financialization and corporate financialization, which contributes to a more rational understanding and analysis of real firms’ financialization behaviors, and it has highlighted the difference between managerial myopia and financialization. The findings suggest that managerial myopia leads to corporate over-financialization, providing a new explanation for the intrinsic motivation. Furthermore, the findings highlight a potential avenue for corporate governance strategies to address and mitigate the influence of managerial myopia, thereby curbing the occurrence of over-financialization. Finally, the findings suggest that researchers can explore the broader economic ramifications of excessive finance from the over-financialization perspective in the future, rather than simply viewing financialization as a homogenous behavior.

Research limitations

While this study contributes significantly to the understanding of corporate financialization and proposes a more balanced governance approach for firms by considering both short-term and long-term shareholder interests, it does have certain limitations. From the long-term governance perspective, companies that spend more earnings on innovation and infrastructure may be more likely to achieve higher levels of succuss in the future. In contrast, the prevalence of financialization behaviors in the sample of real firms in China can hinder direct comparisons between the two various approaches, which is an inherent constraint of this research. In light of these limitations, future research can conduct delve deeper into the examination of the long-term earnings and performance outcomes of financialized firms versus not financialized firms.

Acknowledgments

We would like to thank all members of the Doctoral Program in Collaborative Innovation Center of Modern Grain Circulation and Safety, and all support from the Nanjing University of Finance and Economics for making it possible to carry out this work.

COMMENTS

  1. Financial Statement Analysis: A Review and Current Issues

    The literature on financial statement analysis attempts to improve fundamental analysis and to identify market inefficiencies with respect to financial statement information. In this paper, I review the extant research on financial statement analysis.

  2. CHAPTER#2 Literature review comparing and analyzing financial

    s of financial planning analysis and decision making is the financial information (Statement. ). Financial statements are needed to predict, compare and evaluate a firm's earning ability. t is ...

  3. Financial statement analysis: a review and current issues

    Abstract. Purpose The literature on financial statement analysis attempts to improve fundamental analysis and to identify market inefficiencies with respect to financial statement information ...

  4. Financial statement analysis: a review and current issues

    The literature on financial statement analysis attempts to improve fundamental analysis and to identify market inefficiencies with respect to financial statement information.,In this paper, the author reviews the extant research on financial statement analysis.,The author then provides some preliminary evidence using Chinese data and offer ...

  5. Financial Statement Analysis

    A primary approach to evaluating and comparing financial performance of enterprises is a ratio analysis, which deals with a set of metrics that are typically computed on the basis of inputs extracted from primary financial statements (discussed in Chapter 1) and notes to them (discussed in Chapter 2).As will be demonstrated in the following sections, most of those accounting ratios are ...

  6. 50589 PDFs

    Financial statement analysis is a critical component of decision-making for businesses, investors, and financial professionals. To enhance the accuracy and effectiveness of such analysis, this ...

  7. Financial Statement Analysis

    Abstract. This chapter provides fundamental financial analysis based on ratio analysis, a powerful tool to assess the performance of a firm over a period, or to compare risk and return of firms of different sizes. The discussion centres on the income statement, the balance sheet, the statement of shareholders' equity, and the cash flow ...

  8. Financial statement analysis: Principal component analysis (PCA

    Literature review. Financial analysis involves the use of quantitative information from financial statements, that is, income statement, balance sheet and statement of cash flows in order to come up with relationships of the items that are reported by the company according to the accounting standards for reporting.

  9. PDF Financial Statement Analysis: A Review and Current Issues

    The literature on financial statement analysis attempts to improve fundamental analysis and to identify market inefficiencies with respect to financial statement information. In this paper, I review the extant research on financial statement analysis. I then provide some

  10. Financial statement analysis: a review and current issues

    Abstract. Purpose - The literature on financial statement analysis attempts to improve fundamental analysis and to identify market inefficiencies with respect to financial statement information. Design/methodology/approach - In this paper, the author reviews the extant research on financial statement analysis.

  11. Financial reporting quality of financial institutions: Literature review

    1. Introduction. Financial disclosure is a statement issued by a firm, business, or corporation that defines the financial strategies being employed and provides information such as expenses and earnings for a specific period (Alslihat et al., Citation 2017).Corporations are now compelled to fully disclose all financial and non-financial information.

  12. A Literature Review of Financial Performance Measures and Value

    The study is based on the theory background and relevant researches in the areas of performance measures disclosed in financial statements. The sample of the case studies and sorts of literature are specifically collected from the well-known and respected accounting journals investigating in performance measures areas from 2010 to 2016, which are available on open access.

  13. Financial statement analysis: a review and

    Introduction. In this paper I review the trends in the literature on financial statement analysis (FSA), and provide insights into the relevance of FSA research in emerging trends. FSA research is generally concerned with two key issues - improving fundamental analysis and identifying market inefficiencies with respect to financial statement ...

  14. PDF AP21C: Literature review

    Separating operating from financing activities is value relevant. Research. Findings. Libby et al (2013) Reviews literature and provides a framework for understanding academic research on earnings presentation: users rely on disaggregation, but it is most useful if provided cohesively across all PFS.

  15. PDF Financial statement analysis: a review and current issues

    UNSW Business School, Sydney, Australia. Abstract. Purpose The literature on financial statement analysis attempts to improve fundamental analysis and to identify market inefficiencies with respect to financial statement information. Design/methodology/approach In this paper, the author reviews the extant research on financial.

  16. Book Review: Financial Statement Analysis: A Practitioner's Guide

    Financial Statement Analysis: A Practitioner's Guide is a well-organized, thorough exploration of the challenges facing practitioners who rely on financial statements to make investment and lending decisions. In the preface, Martin Fridson and Fernando Alvarez state that their "intention is to acquaint readers who have already acquired basic accounting skills with the complications that ...

  17. Financial Statement Analysis: How It's Done, by Statement Type

    Financial statement analysis is the process of analyzing a company's financial statements for decision-making purposes. External stakeholders use it to understand the overall health of an ...

  18. PDF Financial Statement Analysis of Dabur India: A Comprehensive Review of

    Literature Review: Financial Statement Analysis is a method used by external stakeholders to assess an organization's overall health as well as its financial performance and commercial worth when using the financial statements to make decisions.

  19. An Examination of Financial Performance: a Review Study

    KEYWORDS: Financial performance analysis, literature review. In order to undertake any research a strong background study is always required. Research always stands on a strong footing in the form of comprehensive and extensive research review. ... Financial statement analysis using ratios has been one of the most commonly used primary models ...

  20. Financial Statement Analysis of ITC Limited by Neha Rawat

    This paper provides a detailed financial analysis of ITC Ltd with an attempt to assess the company's efficiency and performance.The study has focused on past and present performance of ITC Ltd over the period of five years for analyzing the trends. Keywords: Financial Analysis, Ratio analysis, DuPont analysis. Suggested Citation:

  21. Two decades of financial statement fraud detection literature review

    The emergence of machine learning has opened a new way for researchers. It allows them to supplement the traditional manual methods for conducting a literature review and turning it into smart literature. This study aims to present a framework for incorporating machine learning into financial statement fraud (FSF) literature analysis.

  22. PDF Financial Ratio Analysis: A Theoretical Study

    financial statement analysis. A ratio is a mathematical calculation to analyze relationship of two or more variables by using fraction, proportion, percentage and a number of times. When figures are calculated by referring to two accounting numbers derived from the financial statement, it is termed as accounting ratio.

  23. (PDF) Financial Performance Analysis (MBA project)

    Introduction Literature Review Research Method ... Analysis of financial statements is the process of evaluating the ... In brief, financial analysis is the process of selection, relation and ...

  24. Financial Performance Analysis

    Financial Performance Analysis - Literature Review - Free download as Word Doc (.doc / .docx), PDF File (.pdf), Text File (.txt) or read online for free. This document summarizes several studies that examined the financial performance of companies in various industries and countries using financial ratios and other statistical analysis methods ...

  25. Understanding Financial Statements for Business Analysis

    2 Financial Statements This paper will review financial statements needed in the analysis of a business's financial performance, these include the income statement, balance sheet, and statement of cash flows. Analyzing financial statements is an essential component of determining profitability and accurately measuring how a business is performing financially.

  26. Environmental reporting in public sector organizations: A review of

    A structured literature review was conducted by content and bibliometric analysis as well as applying the analytical framework by Manes Rossi et al. (2020) to draft the state-of-the-art of ER in public sector organizations and link them to the environmental SDG targets.

  27. Financial fraud detection through the application of machine learning

    The information presented in Fig. 4 is the result of a clustering analysis of the articles resulting from the literature review on financial fraud detection by implementing ML models. In total, 48 ...

  28. Full article: Students' ideas about the scientific underpinnings of

    In the analysed literature analysis, a wide range of terms is employed to describe the constructs under investigation; here it's common for authors to use multiple terms. ... This systematic review of the literature on students' ideas about the scientific underpinnings of CC presented large amounts of data, and a multitude of descriptions ...

  29. Does managerial myopia promote enterprises over-financialization

    This paper analyzes the potential shortsightedness of enterprise managers through annual reports. Additionally, we use corporate financial statement data to measure enterprises over-financialization in terms of resource allocation. After testing with a causal inference model, we find that firms with managerial myopia significantly contribute to over-financialization. It remains robust even ...

  30. Exploring organizational career growth: a systematic literature review

    2.1. Article selection process. The SLR provides an objective, comprehensive, replicable, scientific, and transparent empirical research process through an exhaustive search of the published literature on keywords or important themes (Cook et al., Citation 1997).It aims to collect as many relevant details as possible from each piece of literature, encompassing methods, variables, and analyses.