U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Med Libr Assoc
  • v.106(4); 2018 Oct

A systematic approach to searching: an efficient and complete method to develop literature searches

Associated data.

Creating search strategies for systematic reviews, finding the best balance between sensitivity and specificity, and translating search strategies between databases is challenging. Several methods describe standards for systematic search strategies, but a consistent approach for creating an exhaustive search strategy has not yet been fully described in enough detail to be fully replicable. The authors have established a method that describes step by step the process of developing a systematic search strategy as needed in the systematic review. This method describes how single-line search strategies can be prepared in a text document by typing search syntax (such as field codes, parentheses, and Boolean operators) before copying and pasting search terms (keywords and free-text synonyms) that are found in the thesaurus. To help ensure term completeness, we developed a novel optimization technique that is mainly based on comparing the results retrieved by thesaurus terms with those retrieved by the free-text search words to identify potentially relevant candidate search terms. Macros in Microsoft Word have been developed to convert syntaxes between databases and interfaces almost automatically. This method helps information specialists in developing librarian-mediated searches for systematic reviews as well as medical and health care practitioners who are searching for evidence to answer clinical questions. The described method can be used to create complex and comprehensive search strategies for different databases and interfaces, such as those that are needed when searching for relevant references for systematic reviews, and will assist both information specialists and practitioners when they are searching the biomedical literature.

INTRODUCTION

Librarians and information specialists are often involved in the process of preparing and completing systematic reviews (SRs), where one of their main tasks is to identify relevant references to include in the review [ 1 ]. Although several recommendations for the process of searching have been published [ 2 – 6 ], none describe the development of a systematic search strategy from start to finish.

Traditional methods of SR search strategy development and execution are highly time consuming, reportedly requiring up to 100 hours or more [ 7 , 8 ]. The authors wanted to develop systematic and exhaustive search strategies more efficiently, while preserving the high sensitivity that SR search strategies necessitate. In this article, we describe the method developed at Erasmus University Medical Center (MC) and demonstrate its use through an example search. The efficiency of the search method and outcome of 73 searches that have resulted in published reviews are described in a separate article [ 9 ].

As we aimed to describe the creation of systematic searches in full detail, the method starts at a basic level with the analysis of the research question and the creation of search terms. Readers who are new to SR searching are advised to follow all steps described. More experienced searchers can consider the basic steps to be existing knowledge that will already be part of their normal workflow, although step 4 probably differs from general practice. Experienced searchers will gain the most from reading about the novelties in the method as described in steps 10–13 and comparing the examples given in the supplementary appendix to their own practice.

CREATING A SYSTEMATIC SEARCH STRATEGY

Our methodology for planning and creating a multi-database search strategy consists of the following steps:

  • Determine a clear and focused question
  • Describe the articles that can answer the question
  • Decide which key concepts address the different elements of the question
  • Decide which elements should be used for the best results
  • Choose an appropriate database and interface to start with
  • Document the search process in a text document
  • Identify appropriate index terms in the thesaurus of the first database
  • Identify synonyms in the thesaurus
  • Add variations in search terms
  • Use database-appropriate syntax, with parentheses, Boolean operators, and field codes
  • Optimize the search
  • Evaluate the initial results
  • Check for errors
  • Translate to other databases
  • Test and reiterate

Each step in the process is reflected by an example search described in the supplementary appendix .

1. Determine a clear and focused question

A systematic search can best be applied to a well-defined and precise research or clinical question. Questions that are too broad or too vague cannot be answered easily in a systematic way and will generally result in an overwhelming number of search results. On the other hand, a question that is too specific will result into too few or even zero search results. Various papers describe this process in more detail [ 10 – 12 ].

2. Describe the articles that can answer the question

Although not all clinical or research questions can be answered in the literature, the next step is to presume that the answer can indeed be found in published studies. A good starting point for a search is hypothesizing what the research that can answer the question would look like. These hypothetical (when possible, combined with known) articles can be used as guidance for constructing the search strategy.

3. Decide which key concepts address the different elements of the question

Key concepts are the topics or components that the desired articles should address, such as diseases or conditions, actions, substances, settings, domains (e.g., therapy, diagnosis, etiology), or study types. Key concepts from the research question can be grouped to create elements in the search strategy.

Elements in a search strategy do not necessarily follow the patient, intervention, comparison, outcome (PICO) structure or any other related structure. Using the PICO or another similar framework as guidance can be helpful to consider, especially in the inclusion and exclusion review stage of the SR, but this is not necessary for good search strategy development [ 13 – 15 ]. Sometimes concepts from different parts of the PICO structure can be grouped together into one search element, such as when the desired outcome is frequently described in a certain study type.

4. Decide which elements should be used for the best results

Not all elements of a research question should necessarily be used in the search strategy. Some elements are less important than others or may unnecessarily complicate or restrict a search strategy. Adding an element to a search strategy increases the chance of missing relevant references. Therefore, the number of elements in a search strategy should remain as low as possible to optimize recall.

Using the schema in Figure 1 , elements can be ordered by their specificity and importance to determine the best search approach. Whether an element is more specific or more general can be measured objectively by the number of hits retrieved in a database when searching for a key term representing that element. Depending on the research question, certain elements are more important than others. If articles (hypothetically or known) exist that can answer the question but lack a certain element in their titles, abstracts, or keywords, that element is unimportant to the question. An element can also be unimportant because of expected bias or an overlap with another element.

An external file that holds a picture, illustration, etc.
Object name is jmla-106-531-f001.jpg

Schema for determining the optimal order of elements

Bias in elements

The choice of elements in a search strategy can introduce bias through use of overly specific terminology or terms often associated with positive outcomes. For the question “does prolonged breastfeeding improve intelligence outcomes in children?,” searching specifically for the element of duration will introduce bias, as articles that find a positive effect of prolonged breastfeeding will be much more likely to mention time factors in their titles or abstracts.

Overlapping elements

Elements in a question sometimes overlap in their meaning. Sometimes certain therapies are interventions for one specific disease. The Lichtenstein technique, for example, is a repair method for inguinal hernias. There is no need to include an element of “inguinal hernias” to a search for the effectiveness of the Lichtenstein therapy. Likewise, sometimes certain diseases are only found in certain populations. Adding such an overlapping element could lead to missing relevant references.

The elements to use in a search strategy can be found in the plot of elements in Figure 1 , by following the top row from left to right. For this method, we recommend starting with the most important and specific elements. Then, continue with more general and important elements until the number of results is acceptable for screening. Determining how many results are acceptable for screening is often a matter of negotiation with the SR team.

5. Choose an appropriate database and interface to start with

Important factors for choosing databases to use are the coverage and the presence of a thesaurus. For medically oriented searches, the coverage and recall of Embase, which includes the MEDLINE database, are superior to those of MEDLINE [ 16 ]. Each of these two databases has its own thesaurus with its own unique definitions and structure. Because of the complexity of the Embase thesaurus, Emtree, which contains much more specific thesaurus terms than the MEDLINE Medical Subject Headings (MeSH) thesaurus, translation from Emtree to MeSH is easier than the other way around. Therefore, we recommend starting in Embase.

MEDLINE and Embase are available through many different vendors and interfaces. The choice of an interface and primary database is often determined by the searcher’s accessibility. For our method, an interface that allows searching with proximity operators is desirable, and full functionality of the thesaurus, including explosion of narrower terms, is crucial. We recommend developing a personal workflow that always starts with one specific database and interface.

6. Document the search process in a text document

We advise designing and creating the complete search strategies in a log document, instead of directly in the database itself, to register the steps taken and to make searches accountable and reproducible. The developed search strategies can be copied and pasted into the desired databases from the log document. This way, the searcher is in control of the whole process. Any change to the search strategy should be done in the log document, assuring that the search strategy in the log is always the most recent.

7. Identify appropriate index terms in the thesaurus of the first database

Searches should start by identifying appropriate thesaurus terms for the desired elements. The thesaurus of the database is searched for matching index terms for each key concept. We advise restricting the initial terms to the most important and most relevant terms. Later in the process, more general terms can be added in the optimization process, in which the effect on the number of hits, and thus the desirability of adding these terms, can be evaluated more easily.

Several factors can complicate the identification of thesaurus terms. Sometimes, one thesaurus term is found that exactly describes a specific element. In contrast, especially in more general elements, multiple thesaurus terms can be found to describe one element. If no relevant thesaurus terms have been found for an element, free-text terms can be used, and possible thesaurus terms found in the resulting references can be added later (step 11).

Sometimes, no distinct thesaurus term is available for a specific key concept that describes the concept in enough detail. In Emtree, one thesaurus term often combines two or more elements. The easiest solution for combining these terms for a sensitive search is to use such a thesaurus term in all elements where it is relevant. Examples are given in the supplementary appendix .

8. Identify synonyms in the thesaurus

Most thesauri offer a list of synonyms on their term details page (named Synonyms in Emtree and Entry Terms in MeSH). To create a sensitive search strategy for SRs, these terms need to be searched as free-text keywords in the title and abstract fields, in addition to searching their associated thesaurus terms.

The Emtree thesaurus contains more synonyms (300,000) than MeSH does (220,000) [ 17 ]. The difference in number of terms is even higher considering that many synonyms in MeSH are permuted terms (i.e., inversions of phrases using commas).

Thesaurus terms are ordered in a tree structure. When searching for a more general thesaurus term, the more specific (narrower) terms in the branches below that term will also be searched (this is frequently referred to as “exploding” a thesaurus term). However, to perform a sensitive search, all relevant variations of the narrower terms must be searched as free-text keywords in the title or abstract, in addition to relying on the exploded thesaurus term. Thus, all articles that describe a certain narrower topic in their titles and abstracts will already be retrieved before MeSH terms are added.

9. Add variations in search terms (e.g., truncation, spelling differences, abbreviations, opposites)

Truncation allows a searcher to search for words beginning with the same word stem. A search for therap* will, thus, retrieve therapy, therapies, therapeutic, and all other words starting with “therap.” Do not truncate a word stem that is too short. Also, limitations of interfaces should be taken into account, especially in PubMed, where the number of search term variations that can be found by truncation is limited to 600.

Databases contain references to articles using both standard British and American English spellings. Both need to be searched as free-text terms in the title and abstract. Alternatively, many interfaces offer a certain code to replace zero or one characters, allowing a search for “pediatric” or “paediatric” as “p?ediatric.” Table 1 provides a detailed description of the syntax for different interfaces.

Field codes in five most used interfaces for biomedical literature searching

PubMedOvidEBSCOhostEmbase.comProQuest
Title/abstract[tiab] ().ab,ti.TI () OR AB () ():ab,tiAB,TI()
All fields[All Fields].af. ALL
Thesaurus term[mesh:noexp]…/MH “…”‘…’/deMESH(…)
Including narrower[mesh]exp …/MH “…+”‘…’/expMESH#(…)
Combined subheading [mesh]exp …/ MH “…+/ ”‘…’/exp/dm_ MESH(… LNK ..)
Free subheading[sh] .xs. or .fs. MW:lnk
Publication type[pt] .pt. or exp / PT:it RTYPE
Proximity ADJnNnNEAR/n-NEXT/nN/n
Exact phrase“double quotes”No quotes needed“double quotes”‘single quotes’“double quotes”
Truncated phraseUse-hyphen*No quote*No quote*‘single quote*’“Double quote*”
TruncationEndEnd/ midEnd/ midEnd/ midEnd / mid / start
Infinite** or $***
0 or 1 character?#$1
1 character#?? ?
Added to database sinceyyyy/mm/dd:yyyy/mm/dd [edat] (or [mhda])limit #N to rd=yyyymmdd-yyyymmdd EM yyyymmdd-yyyymmdd[dd-mm-yyyy]/sdLUPD(yyyymmdd)
Publication period (years)yyyy:yyyy[dp]limit #N to yr=yyyy-yyyy PY yyyy-yyyy[yyyy-yyyy]/pyYR (yyyy-yyyy)
Record sets#11 S1#1S1

Searching for abbreviations can identify extra, relevant references and retrieve more irrelevant ones. The search can be more focused by combining the abbreviation with an important word that is relevant to its meaning or by using the Boolean “NOT” to exclude frequently observed, clearly irrelevant results. We advise that searchers do not exclude all possible irrelevant meanings, as it is very time consuming to identify all the variations, it will result in unnecessarily complicated search strategies, and it may lead to erroneously narrowing the search and, thereby, reduce recall.

Searching partial abbreviations can be useful for retrieving relevant references. For example, it is very likely that an article would mention osteoarthritis (OA) early in the abstract, replacing all further occurrences of osteoarthritis with OA . Therefore, it may not contain the phrase “hip osteoarthritis” but only “hip oa.”

It is also important to search for the opposites of search terms to avoid bias. When searching for “disease recurrence,” articles about “disease free” may be relevant as well. When the desired outcome is survival , articles about mortality may be relevant.

10. Use database-appropriate syntax, with parentheses, Boolean operators, and field codes

Different interfaces require different syntaxes, the special set of rules and symbols unique to each database that define how a correctly constructed search operates. Common syntax components include the use of parentheses and Boolean operators such as “AND,” “OR,” and “NOT,” which are available in all major interfaces. An overview of different syntaxes for four major interfaces for bibliographic medical databases (PubMed, Ovid, EBSCOhost, Embase.com, and ProQuest) is shown in Table 1 .

Creating the appropriate syntax for each database, in combination with the selected terms as described in steps 7–9, can be challenging. Following the method outlined below simplifies the process:

  • Create single-line queries in a text document (not combining multiple record sets), which allows immediate checking of the relevance of retrieved references and efficient optimization.
  • Type the syntax (Boolean operators, parentheses, and field codes) before adding terms, which reduces the chance that errors are made in the syntax, especially in the number of parentheses.
  • Use predefined proximity structures including parentheses, such as (() ADJ3 ()) in Ovid, that can be reused in the query when necessary.
  • Use thesaurus terms separately from free-text terms of each element. Start an element with all thesaurus terms (using “OR”) and follow with the free-text terms. This allows the unique optimization methods as described in step 11.
  • When adding terms to an existing search strategy, pay close attention to the position of the cursor. Make sure to place it appropriately either in the thesaurus terms section, in the title/abstract section, or as an addition (broadening) to an existing proximity search.

The supplementary appendix explains the method of building a query in more detail, step by step for different interfaces: PubMed, Ovid, EBSCOhost, Embase.com, and ProQuest. This method results in a basic search strategy designed to retrieve some relevant references upon which a more thorough search strategy can be built with optimization such as described in step 11.

11. Optimize the search

The most important question when performing a systematic search is whether all (or most) potentially relevant articles have been retrieved by the search strategy. This is also the most difficult question to answer, since it is unknown which and how many articles are relevant. It is, therefore, wise first to broaden the initial search strategy, making the search more sensitive, and then check if new relevant articles are found by comparing the set results (i.e., search for Strategy #2 NOT Strategy #1 to see the unique results).

A search strategy should be tested for completeness. Therefore, it is necessary to identify extra, possibly relevant search terms and add them to the test search in an OR relationship with the already used search terms. A good place to start, and a well-known strategy, is scanning the top retrieved articles when sorted by relevance, looking for additional relevant synonyms that could be added to the search strategy.

We have developed a unique optimization method that has not been described before in the literature. This method often adds valuable extra terms to our search strategy and, therefore, extra, relevant references to our search results. Extra synonyms can be found in articles that have been assigned a certain set of thesaurus terms but that lack synonyms in the title and/or abstract that are already present in the current search strategy. Searching for thesaurus terms NOT free-text terms will help identify missed free-text terms in the title or abstract. Searching for free-text terms NOT thesaurus terms will help identify missed thesaurus terms. If this is done repeatedly for each element, leaving the rest of the query unchanged, this method will help add numerous relevant terms to the query. These steps are explained in detail for five different search platforms in the supplementary appendix .

12. Evaluate the initial results

The results should now contain relevant references. If the interface allows relevance ranking, use that in the evaluation. If you know some relevant references that should be included in the research, search for those references specifically; for example, combine a specific (first) author name with a page number and the publication year. Check whether those references are retrieved by the search. If the known relevant references are not retrieved by the search, adapt the search so that they are. If it is unclear which element should be adapted to retrieve a certain article, combine that article with each element separately.

Different outcomes are desired for different types of research questions. For instance, in the case of clinical question answering, the researcher will not be satisfied with many references that contain a lot of irrelevant references. A clinical search should be rather specific and is allowed to miss a relevant reference. In the case of an SR, the researchers do not want to miss any relevant reference and are willing to handle many irrelevant references to do so. The search for references to include in an SR should be very sensitive: no included reference should be missed. A search that is too specific or too sensitive for the intended goal can be adapted to become more sensitive or specific. Steps to increase sensitivity or specificity of a search strategy can be found in the supplementary appendix .

13. Check for errors

Errors might not be easily detected. Sometimes clues can be found in the number of results, either when the number of results is much higher or lower than expected or when many retrieved references are not relevant. However, the number expected is often unknown, and very sensitive search strategies will always retrieve many irrelevant articles. Each query should, therefore, be checked for errors.

One of the most frequently occurring errors is missing the Boolean operator “OR.” When no “OR” is added between two search terms, many interfaces automatically add an “AND,” which unintentionally reduces the number of results and likely misses relevant references. One good strategy to identify missing “OR”s is to go to the web page containing the full search strategy, as translated by the database, and using Ctrl-F search for “AND.” Check whether the occurrences of the “AND” operator are deliberate.

Ideally, search strategies should be checked by other information specialists [ 18 ]. The Peer Review of Electronic Search Strategies (PRESS) checklist offers good guidance for this process [ 4 ]. Apart from the syntax (especially Boolean operators and field codes) of the search strategy, it is wise to have the search terms checked by the clinician or researcher familiar with the topic. At Erasmus MC, researchers and clinicians are involved during the complete process of structuring and optimizing the search strategy. Each word is added after the combined decision of the searcher and the researcher, with the possibility of directly comparing results with and without the new term.

14. Translate to other databases

To retrieve as many relevant references as possible, one has to search multiple databases. Translation of complex and exhaustive queries between different databases can be very time consuming and cumbersome. The single-line search strategy approach detailed above allows quick translations using the find and replace method in Microsoft Word (<Ctrl-H>).

At Erasmus MC, macros based on the find-and-replace method in Microsoft Word have been developed for easy and fast translation between the most used databases for biomedical and health sciences questions. The schema that is followed for the translation between databases is shown in Figure 2 . Most databases simply follow the structure set by the Embase.com search strategy. The translation from Emtree terms to MeSH terms for MEDLINE in Ovid often identifies new terms that need to be added to the Embase.com search strategy before the translation to other databases.

An external file that holds a picture, illustration, etc.
Object name is jmla-106-531-f002.jpg

Schematic representation of translation between databases used at Erasmus University Medical Center

Dotted lines represent databases that are used in less than 80% of the searches.

Using five different macros, a thoroughly optimized query in Embase.com can be relatively quickly translated into eight major databases. Basic search strategies will be created to use in many, mostly smaller, databases, because such niche databases often do not have extensive thesauri or advanced syntax options. Also, there is not much need to use extensive syntax because the number of hits and, therefore, the amount of noise in these databases is generally low. In MEDLINE (Ovid), PsycINFO (Ovid), and CINAHL (EBSCOhost), the thesaurus terms must be adapted manually, as each database has its own custom thesaurus. These macros and instructions for their installation, use, and adaptation are available at bit.ly/databasemacros.

15. Test and reiterate

Ideally, exhaustive search strategies should retrieve all references that are covered in a specific database. For SR search strategies, checking searches for their recall is advised. This can be done after included references have been determined by the authors of the systematic review. If additional papers have been identified through other non-database methods (i.e., checking references in included studies), results that were not identified by the database searches should be examined. If these results were available in the databases but not located by the search strategy, the search strategy should be adapted to try to retrieve these results, as they may contain terms that were omitted in the original search strategies. This may enable the identification of additional relevant results.

A methodology for creating exhaustive search strategies has been created that describes all steps of the search process, starting with a question and resulting in thorough search strategies in multiple databases. Many of the steps described are not new, but together, they form a strong method creating high-quality, robust searches in a relatively short time frame.

Our methodology is intended to create thoroughness for literature searches. The optimization method, as described in step 11, will identify missed synonyms or thesaurus terms, unlike any other method that largely depends on predetermined keywords and synonyms. Using this method results in a much quicker search process, compared to traditional methods, especially because of the easier translation between databases and interfaces (step 13). The method is not a guarantee for speed, since speed depends on many factors, including experience. However, by following the steps and using the tools as described above, searchers can gain confidence first and increase speed through practice.

What is new?

This method encourages searchers to start their search development process using empty syntax first and later adding the thesaurus terms and free-text synonyms. We feel this helps the searcher to focus on the search terms, instead of on the structure of the search query. The optimization method in which new terms are found in the already retrieved articles is used in some other institutes as well but has to our knowledge not been described in the literature. The macros to translate search strategies between interfaces are unique in this method.

What is different compared to common practice?

Traditionally, librarians and information specialists have focused on creating complex, multi-line (also called line-by-line) search strategies, consisting of multiple record sets, and this method is frequently advised in the literature and handbooks [ 2 , 19 – 21 ]. Our method, instead, uses single-line searches, which is critical to its success. Single-line search strategies can be easily adapted by adding or dropping a term without having to recode numbers of record sets, which would be necessary in multi-line searches. They can easily be saved in a text document and repeated by copying and pasting for search updates. Single-line search strategies also allow easy translation to other syntaxes using find-and-replace technology to update field codes and other syntax elements or using macros (step 13).

When constructing a search strategy, the searcher might experience that certain parentheses in the syntax are unnecessary, such as parentheses around all search terms in the title/abstract portion, if there is only one such term, there are double parentheses in the proximity statement, or one of the word groups exists for only one word. One might be tempted to omit those parentheses for ease of reading and management. However, during the optimization process, the searcher is likely to find extra synonyms that might consist of one word. To add those terms to the first query (with reduced parentheses) requires adding extra parentheses (meticulously placing and counting them), whereas, in the latter search, it only requires proper placement of those terms.

Many search methods highly depend on the PICO framework. Research states that often PICO or PICOS is not suitable for every question [ 22 , 23 ]. There are other acronyms than PICO—such as sample, phenomenon of interest, design, evaluation, research type (SPIDER) [ 24 ]—but each is just a variant. In our method, the most important and specific elements of a question are being analyzed for building the best search strategy.

Though it is generally recommended that searchers search both MEDLINE and Embase, most use MEDLINE as the starting point. It is considered the gold standard for biomedical searching, partially due to historical reasons, since it was the first of its kind, and more so now that it is freely available via the PubMed interface. Our method can be used with any database as a starting point, but we use Embase instead of MEDLINE or another database for a number of reasons. First, Embase provides both unique content and the complete content of MEDLINE. Therefore, searching Embase will be, by definition, more complete than searching MEDLINE only. Second, the number of terms in Emtree (the Embase thesaurus) is three times as high as that of MeSH (the MEDLINE thesaurus). It is easier to find MeSH terms after all relevant Emtree terms have been identified than to start with MeSH and translate to Emtree.

At Erasmus MC, the researchers sit next to the information specialist during most of the search strategy design process. This way, the researchers can deliver immediate feedback on the relevance of proposed search terms and retrieved references. The search team then combines knowledge about databases with knowledge about the research topic, which is an important condition to create the highest quality searches.

Limitations of the method

One disadvantage of single-line searches compared to multi-line search strategies is that errors are harder to recognize. However, with the methods for optimization as described (step 11), errors are recognized easily because missed synonyms and spelling errors will be identified during the process. Also problematic is that more parentheses are needed, making it more difficult for the searcher and others to assess the logic of the search strategy. However, as parentheses and field codes are typed before the search terms are added (step 10), errors in parentheses can be prevented.

Our methodology works best if used in an interface that allows proximity searching. It is recommended that searchers with access to an interface with proximity searching capabilities select one of those as the initial database to develop and optimize the search strategy. Because the PubMed interface does not allow proximity searches, phrases or Boolean “AND” combinations are required. Phrase searching complicates the process and is more specific, with the higher risk of missing relevant articles, and using Boolean “AND” combinations increases sensitivity but at an often high loss of specificity. Due to some searchers’ lack of access to expensive databases or interfaces, the freely available PubMed interface may be necessary to use, though it should never be the sole database used for an SR [ 2 , 16 , 25 ]. A limitation of our method is that it works best with subscription-based and licensed resources.

Another limitation is the customization of the macros to a specific institution’s resources. The macros for the translation between different database interfaces only work between the interfaces as described. To mitigate this, we recommend using the find-and-replace functionality of text editors like Microsoft Word to ease the translation of syntaxes between other databases. Depending on one’s institutional resources, custom macros can be developed using similar methods.

Results of the method

Whether this method results in exhaustive searches where no important article is missed is difficult to determine, because the number of relevant articles is unknown for any topic. A comparison of several parameters of 73 published reviews that were based on a search developed with this method to 258 reviews that acknowledged information specialists from other Dutch academic hospitals shows that the performance of the searches following our method is comparable to those performed in other institutes but that the time needed to develop the search strategies was much shorter than the time reported for the other reviews [ 9 ].

CONCLUSIONS

With the described method, searchers can gain confidence in their search strategies by finding many relevant words and creating exhaustive search strategies quickly. The approach can be used when performing SR searches or for other purposes such as answering clinical questions, with different expectations of the search’s precision and recall. This method, with practice, provides a stepwise approach that facilitates the search strategy development process from question clarification to final iteration and beyond.

SUPPLEMENTAL FILE

Acknowledgments.

We highly appreciate the work that was done by our former colleague Louis Volkers, who in his twenty years as an information specialist in Erasmus MC laid the basis for our method. We thank Professor Oscar Franco for reviewing earlier drafts of this article.

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

This week: the arXiv Accessibility Forum

Help | Advanced Search

Computer Science > Machine Learning

Title: grid search, random search, genetic algorithm: a big comparison for nas.

Abstract: In this paper, we compare the three most popular algorithms for hyperparameter optimization (Grid Search, Random Search, and Genetic Algorithm) and attempt to use them for neural architecture search (NAS). We use these algorithms for building a convolutional neural network (search architecture). Experimental results on CIFAR-10 dataset further demonstrate the performance difference between compared algorithms. The comparison results are based on the execution time of the above algorithms and accuracy of the proposed models.
Comments: 11 pages, 5 figures, 3 tables
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Cite as: [cs.LG]
  (or [cs.LG] for this version)
  Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access paper:.

  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

DBLP - CS Bibliography

Bibtex formatted citation.

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

search algorithms research papers

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

  •  We're Hiring!
  •  Help Center

Search Algorithm

  • Most Cited Papers
  • Most Downloaded Papers
  • Newest Papers
  • Last »
  • Optimal Solution Follow Following
  • Lower Bound Follow Following
  • Information Sciences Follow Following
  • Job rotation Follow Following
  • Pattern Search Follow Following
  • Cognitive Science Follow Following
  • Production economics Follow Following
  • Pure and applied sciences Follow Following
  • Computer Networks and Information Security Follow Following
  • Synthetic Biology Follow Following

Enter the email address you signed up with and we'll email you a reset link.

  • Academia.edu Journals
  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 14 January 2021

Quantifying computational advantage of Grover’s algorithm with the trace speed

  • Valentin Gebhart 1 , 2 ,
  • Luca Pezzè 1 &
  • Augusto Smerzi 1  

Scientific Reports volume  11 , Article number:  1288 ( 2021 ) Cite this article

3637 Accesses

7 Citations

Metrics details

  • Quantum information
  • Theoretical physics

Despite intensive research, the physical origin of the speed-up offered by quantum algorithms remains mysterious. No general physical quantity, like, for instance, entanglement, can be singled out as the essential useful resource. Here we report a close connection between the trace speed and the quantum speed-up in Grover’s search algorithm implemented with pure and pseudo-pure states. For a noiseless algorithm, we find a one-to-one correspondence between the quantum speed-up and the polarization of the pseudo-pure state, which can be connected to a wide class of quantum statistical speeds. For time-dependent partial depolarization and for interrupted Grover searches, the speed-up is specifically bounded by the maximal trace speed that occurs during the algorithm operations. Our results quantify the quantum speed-up with a physical resource that is experimentally measurable and related to multipartite entanglement and quantum coherence.

Similar content being viewed by others

search algorithms research papers

Better-than-classical Grover search via quantum error detection and suppression

search algorithms research papers

Quantum-walk search in motion

search algorithms research papers

Hilbert–Schmidt speed as an efficient figure of merit for quantum estimation of phase encoded into the initial state of open n -qubit systems

Introduction.

Understanding and quantifying the key resource for the speed-up of quantum computations 1 , 2 has been a highly disputed topic over the past few decades 3 . There has been particular interest in the role played by entanglement 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 . It is known that exponential speed-up of quantum algorithms implemented with pure states requires multipartite entanglement 8 , 9 . However, it was shown that a polynomial advantage can be achieved without entanglement 14 . Also, it is an open question whether exponential quantum advantage can be reached in mixed-state algorithms in absence of entanglement. In this case, other quantum correlations such as quantum discord have been indicated as possible candidates for computational resources 3 , 15 . Furthermore, it was shown that several entanglement measures cannot quantify advantages of many quantum algorithms 13 . Other possible resources have been considered such as coherence 16 , 17 , 18 , distinguishability 3 , contextuality 19 , tree size 20 and interference 21 . In short, despite having been the subject of extensive research, understanding the resource of the speed-up in quantum computations, even for a benchmark algorithm such as Grover’s algorithm, is still an open and urgent quest.

Quantum statistical speeds 22 , 23 , 24 , 25 , 26 offer a possible approach to quantify useful resources in quantum technology tasks. As a major example, the quantum Fisher information 25 , 27 , which is the quantum statistical speed associated with the Bures distance 25 , was shown to fully characterize metrologically useful entanglement 28 , 29 , 30 , that is, the entanglement necessary for sub-shot-noise phase estimation sensitivities 31 , 32 . One might conjecture that different statistical speeds may be useful to characterize the performances of different quantum tasks. Here, we use the trace speed ( \({\text {TS}}\) ), namely, the statistical speed associated to the trace distance 1 , 23 , to quantify the speed-up in Grover’s algorithm 33 in both absence and presence of dephasing. In particular, we show that in the pseudo-pure model without dephasing, the speed-up is completely determined by the polarization of the pseudo-pure state, which can be linked to a wide class of quantum statistical speeds. For general pseudo-pure dephasing models 34 , we prove that the maximal \({\text {TS}}\) occuring during the algorithm bounds the speed-up, rendering it a necessary resource for quantum advantage. The \({\text {TS}}\) is an experimentally relevant measure of quantum coherence (asymmetry) 35 , 36 and witnesses multipartite entanglement 26 . To our knowledge, this is the first result for a physical resource in Grover’s algorithm that generalizes to mixed state versions. This can pave the way to a new approach to investigate useful resources in quantum computations.

Grover’s algorithm and its cost

Grover’s search algorithm 33 is one of the most important protocols of quantum computation 1 , 2 . It searches an unstructured database of N elements for a target \(\omega\) . The target is marked in the sense that one is given a test function f that vanishes for all elements but \(\omega\) . The task is to identify \(\omega\) with as few function calls as possible. In the quantum version of the algorithm, a function call can be used as a measurement or as an application of a corresponding unitary, the so-called oracle unitary. As we will discuss shortly, Grover’s algorithm admits a quadratic advantage to classical search algorithms. To utilize the exponential size of the dimensionality of composite quantum systems 6 , we encode all different elements x of the register into computational basis states of \(n=\log _2 N\) qubits, \(x\in \left\{ 0,1\right\} ^{ n}\) . Grover’s algorithm is performed by preparing the system in the register state \(\left| \psi _{\rm {in}} \right\rangle =1/\sqrt{2^n}\sum _x \left| x \right\rangle\) , where \(\left| x \right\rangle\) are the computational basis vectors, followed by k applications of the Grover unitary \(G=U_d U_\omega\) . Here, the oracle unitary \(U_\omega =1-2\left| \omega \right\rangle \left\langle \omega \right|\) represents a function call and the Grover diffusion operator is defined as \(U_d=2\left| \psi _{\rm {in}} \right\rangle \left\langle \psi _{\rm {in}} \right| -1\) . After k iterations of the Grover unitary, the state of the system is given by 2

where \(\theta =\arcsin (1/\sqrt{2^n})\) and \(\left| \omega ^\perp \right\rangle =1/\sqrt{2^n-1}\sum _{x\ne \omega }\left| x \right\rangle\) is the projection of the initial state on the subspace orthogonal to \(\left| \omega \right\rangle\) . This yields a probability \(p_k\) of finding the target state as \(p_k=\sin ^2 [(2k+1)\theta ]\) . After \(k_{\rm {Gr}}\approx (\pi /4) \sqrt{2^n}\) iterations one finds the target state \(\left| \omega \right\rangle\) with probability \(p_{k_{\rm {Gr}}}=1-{\mathcal {O}}(1/2^n)\) 2 .

One defines the cost C for a general search algorithm as the average number of applications of the test function f (or its corresponding oracle unitary) required to find the target state 34 . Simply counting the oracle applications is also known as query complexity 2 , while other complexities such as gate complexity are usually not considered in Grover’s algorithm (see 6 for a discussion).

In the classical search algorithm, the query application can be thought of as opening one of \(2^n\) boxes, where each box represents one state of the register. For an unstructured search algorithm, i.e., in each iteration one randomly opens one of the \(2^n\) boxes, the average number of steps needed to find the target state is given by \(C_{\rm {cl}}=2^n\) . If one remembers the outcome of all previous searches, the cost can be reduced to \(C_{\rm {cl}}=2^{n}/2+{\mathcal {O}}(1)\) 34 . Note that \(C_{\rm {cl}}\) for both structured and unstructured searches scales with \(2^n\) .

In a quantum search algorithm, one uses k oracle unitaries and a final oracle measurement yielding the target with probability \(p_k\) , such that the cost is given by 34

Hence, the optimal cost is obtained by minimizing \(C_{\rm {qu}}(k)\) over the number of oracle applications, \(C_{\rm {qu}}=\min _k(k+1)/(p_k)\) . Let us emphasize that this definition of the cost does not distinguish between applying the oracle as a unitary or as a measurement observable.

In Grover’s algorithm, the cost function Eq. ( 2 ) is not necessarily minimal for the highest success probability \(p_k\) of one single search 37 . However, the optimal number of steps \({\tilde{k}}_{\rm {Gr}}\) and the optimal cost \(C_{\rm {qu}}\) for large n still scales as \({\tilde{k}}_{\rm {Gr}}= r \sqrt{2^n}\) and \(C_{\rm {qu}}=K \sqrt{2^n}\) , where r is the solution of \(\tan (2r)=4r\) and \(K=r/\sin ^2(2r)\) , yielding the quadratic speed-up over \(C_{\rm {cl}}\) . It was shown that this speed-up is optimal 37 , 38 .

Grover’s algorithm can be executed on a single multimode system and, therefore, simply makes use of superposition and constructive interference 6 , 39 , 40 . However, in order to reduce exponential overhead in space, time or energy, one usually considers a system composed of many qubits 6 , 39 . In this case, different measures of bipartite and multipartite entanglement have been used to detect entanglement during Grover’s algorithm 41 , 42 , 43 , 44 , 45 . Genuine multipartite entanglement was shown to be present already after the first step of the noiseless algorithm 41 . However, the quantitative relationship between these measures and speed-up was not resolved. In particular, the methods could not be easily applied to any mixed state generalization of Grover’s algorithm. Quantum coherence 46 , 47 , 48 and quantum discord 46 have been considered as resources in the noiseless algorithm as well.

Quantifying speed-up in the unitary algorithm

In this section, we quantify the speed-up in a mixed-state generalization of Grover’s algorithm with quantum statistical speeds. We consider Grover’s algorithm with the register initialized in a pseudo-pure state 49 , while the algorithm is still implemented with unitary operations. For a pure n -qubit state \(\left| \psi \right\rangle\) , the corresponding pseudo-pure state \(\rho _{\psi ,\epsilon }\) with polarization \(\epsilon\) is defined as

Pseudo-pure states represent one of the simplest models for mixed-state quantum computation and play a central role in the partial depolarizing noise model we consider later. We replace the pure initial state \(\left| \psi _{\rm {in}} \right\rangle\) with the pseudo-pure state \(\rho _{\psi _{\rm {in}},\epsilon }\) such that, after k Grover iterations, the state of the system is given by \(\rho _k=\epsilon \left| \psi _k \right\rangle \left\langle \psi _k \right| +(1-\epsilon ){\mathbb {I}}/2^n\) , with \(\left| \psi _k \right\rangle\) defined in Eq. ( 1 ). The probability \(p_k\) of finding the target state after k steps is \(p_k=\epsilon \sin ^2 ((2k+1)\theta )+(1-\epsilon )/2^n\) . Here we observe that for \(\epsilon ={\mathcal {O}}(1/2^n)\) , it becomes more efficient to just measure the state without any iteration because the probability contribution due to the Grover iteration is no longer dominant 7 . However, if \(\epsilon\) does not decrease exponentially with n , one can neglect the second term in \(p_k\) . Hence, the minimum of Eq. ( 2 ) occurs after the same number of steps as in the pure state algorithm while its minimal value \(C_{\rm {qu}}\) is simply \(C_{\rm {qu}}=C_{\rm {qu,pure}}/\epsilon\) , where \(C_{\rm {qu,pure}}\) is the cost of the pure state algorithm.

In this model, the speed-up of the algorithm is completely determined by the depolarization \(\epsilon\) . Therefore, all quantities which can be connected to \(\epsilon\) are quantifiers of the speed-up. In particular, this holds for a general quantum statistical speed \({\text {QS}}\) with the property that for \(\epsilon \gg 1/2^n\) , \({\text {QS}}(\rho _{\psi ,\epsilon })=\epsilon {\text {QS}}(\left| \psi \right\rangle \left\langle \psi \right| )\) . Note that this property holds for a wide class of quantum statistical speeds, for instance, the generalized quantum Fisher information and the quantum Schatten speeds 26 . Given the relations \(\epsilon = {\text {QS}}(\rho _{\psi ,\epsilon })/ {\text {QS}}(\left| \psi \right\rangle \left\langle \psi \right| )\) and \(C_{\rm {qu}}=C_{\rm {qu,pure}}/\epsilon =K\sqrt{2^n}/\epsilon\) , we directly obtain the dependence of the cost function on the maximal \({\text {QS}}\) as

where \(K=r/\sin ^2(2r)\approx 0.69\) with r being the solution of \(\tan (2r)=4r\) , and \({\text {QS}}^{\rm {pure}}_{\rm {max}}\) being the maximal \({\text {QS}}\) during the pure-state algorithm. The quantum speed-up \(S=C_{\rm {cl}}/C_{\rm {qu}}\) is thus given in terms of \({\text {QS}}_{\rm {max}}\) as

As we discuss in more detail in the Methods section, for \(\epsilon\) below a critical polarization \(\epsilon _c=K/\sqrt{2^n}\) , a classical search becomes advantageous. For \(\epsilon >\epsilon _c\) , the above results are valid. Also, for small values of n , a rigorous computation of the cost has to be performed.

As an example of a quantum statistical speed that will play an important role in the next section, consider the specific case of the trace speed ( \({\text {TS}}\) ). The \({\text {TS}}\) is the susceptibility of a quantum state \(\rho\) to unitary displacements generated by a generic Hamiltonian H 35 . That is, the \({\text {TS}}\) quantifies the distinguishability between \(\rho\) and \(\rho (t)=e^{-iHt}\rho e^{iHt}\) for small t . It is defined as 1 , 26 , 35

where \([\cdot ,\cdot ]\) is the commutator and \(\left\Vert \cdot \right\Vert _1\) is the \(l_1\) -norm, defined as \(\left\Vert A \right\Vert _1 = {\text {tr}}\left[ \sqrt{A^\dag A} \right]\) for a generic operator A . Since \({\text {TS}}(\rho _{\psi ,\epsilon })=\left\Vert \left[ \rho _{\psi ,\epsilon }, H \right] \right\Vert _1=\left\Vert \epsilon \left[ \left| \psi \right\rangle \left\langle \psi \right| , H \right] +(1-\epsilon )/2^n\left[ {\mathbb {I}} , H \right] \right\Vert _1=\epsilon {\text {TS}}(\left| \psi \right\rangle \left\langle \psi \right| )\) , the \({\text {TS}}\) can be used as the \({\text {QS}}\) in Eq. ( 5 ).

In general, \({\text {TS}}\) is a measure of coherence, in this case usually referred to as asymmetry 35 : a state with no coherence with respect to H , namely a classical mixture of its eigenstates, will not change under phase displacements, while off-diagonal matrix elements (coherences) of \(\rho\) are responsible for a finite susceptibility to phase displacements. The \({\text {TS}}\) is upper bounded by the quantum Fisher information 23 . If the system is a composite system of n qubits and H is the sum of local Hamiltonians \(H_i\) , \(H=\sum _{i=1}^n H_i\) with \({\text {spec}}(H_i)=\left\{ -1/2,1/2\right\}\) and \({\text {TS}}(\rho ,H)>\sqrt{nr}\) , it follows that \(\rho\) has to be at least \((r+1)\) -partite entangled 26 , 29 , 30 , namely, it cannot be written as a mixture of r -producible pure states. A pure state is r -producible if it is a tensor product of subsystems with each subsystem containing at most r qubits. Since the value of \({\text {TS}}\) depends on the generating Hamiltonian H , we consider the optimization over all Hamiltonians of the above form. When the whole evolution is restricted to the completely symmetric subspace, it suffices to perform this optimization over collective spin Hamiltonians, \(H_i={\mathbf {n}}\cdot \varvec{\sigma }^{(i)}/2\) , where \({\mathbf {n}}\) is a point on the unit sphere and \(\varvec{\sigma }^{(i)}\) are the Pauli operators for the i -th qubit. For pure states \(\left| \psi \right\rangle\) , the optimized \({\text {TS}}\) coincides with the square root of the largest eigenvalue of the matrix \(\Gamma _{ij}=4\left( {\text {Re}}[\left\langle J_iJ_j\right\rangle ]-\left\langle J_i\right\rangle \left\langle J_j\right\rangle \right)\) 29 . Here, \(J_m=\sum _{i=1}^n{\mathbf {e}}_m\cdot \varvec{\sigma } ^{(i)}/2\) is the coherent spin operator in \({\mathbf {e}}_m\) -direction, \(m=x,y,z\) , and \(\langle \cdot \rangle\) is the expectation value with respect to the state \(\left| \psi \right\rangle\) .

Let us first discuss the \({\text {TS}}\) for the standard version of Grover’s algorithm implemented with pure states and unitary evolution, as introduced above. Without loss of generality, we consider \(\left| \omega \right\rangle =\left| 0 \right\rangle ^{\otimes n}\) . This choice corresponds merely to a relabeling of the computational basis vectors of each qubit. Importantly, by this relabeling, the dynamics of the Grover search, as well as the optimized \({\text {TS}}\) of the state during the evolution, are not altered, while the calculation of \({\text {TS}}\) is highly facilitated. Since \(\left| \psi _{\rm {in}} \right\rangle\) and \(\left| 0 \right\rangle ^{\otimes n}\) are elements of the completely symmetric subspace and G commutes with all permutations of the qubits, the complete evolution is restricted to the symmetric subspace, facilitating the computation of \({\text {TS}}\) . By neglecting terms in \({\mathcal {O}}(1/2^n)\) , one can exactly compute the largest eigenvalue of \(\Gamma _{ij}\) at any step k , yielding the optimized \({\text {TS}}\) , see Methods for details. In Fig.  1 , we show the optimized \({\text {TS}}(k)\) for \(n=30\) qubits. The initially separable state \(\left| \psi _{\rm {in}} \right\rangle\) evolves into a multipartite entangled state already after the first oracle operation. Multipartite entanglement further increases until reaching a maximal value of

which occurs at \(k=k_{\rm {Gr}}/2\) . This detects \((n/2+1)\) -partite entanglement during the pure state Grover algorithm. For \(k>k_{\rm {Gr}}/2\) , multipartite entanglement detected by the \({\text {TS}}\) decreases until the algorithm reaches the separable target state \(\left| \omega \right\rangle\) .

figure 1

The dependence of the optimized trace speed \({\text {TS}}\) on the iteration step k in the pure state Grover’s algorithm (solid line). The dashed lines indicate thresholds above which \({\text {TS}}\) detects bipartite ( \(\sqrt{n}\) ), three-partite ( \(\sqrt{2n}\) ) and ( \(n/2+1\) )-partite ( \(\sqrt{n^2/2}\) ) entanglement. Here, \(n=30\) , \(k_{\rm {Gr}}\approx (\pi /4)\sqrt{2^n}\) .

Since \({\text {TS}}(\rho _{\psi ,\epsilon },H)=\epsilon {\text {TS}} (\left| \psi \right\rangle \left\langle \psi \right| ,H)\) , the maximal \({\text {TS}}\) during the Grover algorithm using a pseudo-pure initial state with polarization \(\epsilon\) is \({\text {TS}}_{\rm {max}}=\epsilon \sqrt{n(n+1)/2}\) . Hence, \({\text {TS}}\) witnesses \(\epsilon (n+1)/2\) -partite entanglement. Note that for \(\epsilon <2/(n+1)\) , \({\text {TS}}\) does not detect entanglement anymore. It was already observed that for polarizations \(\epsilon >1/2^{n/2}\) the algorithm still offers a speed-up 3 , 50 , 51 , indicating that entanglement detected by \({\text {TS}}\) is not necessary for quantum speed-up.

Trace speed and the algorithm under partial depolarization

The results of pseudo-pure initial states can be generalized to search dynamics subject to time-dependent partial depolarization (see Refs. 52 , 53 for earlier investigations). In this case, the state after k steps of the algorithm is given by

where the now time-dependent decreasing polarization \(\epsilon (k)\) represents both initial impurity and partial depolarization during the algorithm. The depolarization channel is a widely used noise model whenever the exact form of the noise is not known 54 . As a worst case noise scenario, the knowledge of the state is completely erased with some probability. As can be seen in Fig.  2 , different polarization functions \(\epsilon (k)\) with the same final polarization \(\epsilon _{\rm {f}}\) can lead to different maximal \({\text {QS}}\) during the iteration. While the one-to-one correspondence between the \({\text {QS}}\) and the speed-up is generally lost, as shown below, we can still bound the speed-up using the \({\text {TS}}\) .

figure 2

Trace speed during the pseudo-pure version of Grover’s algorithm. Polarizations \(\epsilon (k)\) (orange lines) and trace speeds \({\text {TS}}(k)\) (blue lines) for an initial pseudo-pure state without dephasing (solid), an initial pure state with linearly decaying polarization (dotted) and an initial pure state with exponentially decaying polarization (dash-dotted). Here, \(n=30\) , \(\epsilon _{\rm {f}}=0.3\) , \(k_{\rm {Gr}}\approx (\pi /4)\sqrt{2^n}\) .

For a partial depolarization during the algorithm it turns out that, in general, it is optimal to stop the iterations and perform the final measurement already at earlier steps \(k_{\rm {int}}<{\tilde{k}}_{\rm {Gr}}\) 52 . We divide the examination into the cases \(k_{\rm {int}}\le k_{\rm {Gr}}/2\) and \(k_{\rm {int}}\ge k_{\rm {Gr}}/2\) , that is, whether we interrupt the iteration before or after the pure state algorithm would have already reached its maximal \({\text {TS}}\) , see Fig.  1 . In the case \(k_{\rm {int}}\ge k_{\rm {Gr}}/2\) , the cost can be bounded by \(C_{\rm {qu}}\ge K \sqrt{2^n}/\epsilon (k_{\rm {int}})\) . This is because if one could completely stop the dephasing from this point, one could reduce the cost until reaching the optimal value of \(K \sqrt{2^n}/\epsilon (k_{\rm {int}})\) , see Eq. ( 4 ). Since \(k_{\rm {int}}\ge k_{\rm {Gr}}/2\) , we have \(\epsilon (k_{\rm {int}})\le \epsilon (k_{\rm {Gr}}/2)\) and \({\text {TS}}(k_{\rm {Gr}}/2)\le {\text {TS}}_{\rm {max}}\) ( \({\text {TS}}_{\rm {max}}\) is the maximal \({\text {TS}}\) before the interruption). Finally, using \(\epsilon (k_{\rm {Gr}}/2)={\text {TS}}(k_{\rm {Gr}}/2)/{\text {TS}}^{\rm {pure}}_{\rm {max}}\) , one can then bound \(C_{\rm {qu}}\ge K \sqrt{2^n}/\epsilon (k_{\rm {Gr}}/2) \ge (K \sqrt{2^n} {\text {TS}}_{\rm {max}}^{\rm {pure}})/({\rm {TS}}_{\rm {max}})\) , yielding the following bound

The case \(k_{\rm {int}}\le k_{\rm {Gr}}/2\) corresponding to strong dephasing becomes more technical since, in the early regime, the maximal \({\text {TS}}\) is not simply bounded by \(\epsilon (k){\text {TS}}^{\rm {pure}}_{\rm {max}}\) . However, as we show in Methods, by using the explicit form of the \({\text {TS}}\) , the bound Eq. ( 9 ) still holds. At this point, the \({\text {TS}}\) stands out from other quantum statistical speeds. For instance, the bound does not hold when using the quantum Fisher information as \({\text {QS}}\) . These results for the case of an interruption of the iteration due to minimization of the cost can also be applied to the case of a general interruption of the iteration. Stopping the algorithm at any time will yield an average speed-up which is always bounded by the maximal \({\text {TS}}\) occurring before the interruption.

Discussion and conclusions

To summarize, we showed that both in the pure state version of the Grover search algorithm and a general pseudo-pure generalization, the trace speed ( \({\text {TS}}\) ) can be used to quantify and bound the possible speed-up over a classical search. These results offer an unprecedented connection between the speed-up in Grover’s algorithm and a physical resource beyond the case of ideal, noiseless quantum algorithms. The \({\text {TS}}\) relates the computational speed of Grover’s algorithm to both multipartite entanglement and quantum coherence. It should be noticed that the relation with multipartite entanglement depends on the n -qubit implementation that we have considered, while the algorithm can also be implemented with a single \(2^n\) -level system 6 . Indeed, as mentioned above, the operating principle of the algorithm and the number of queries used (which determines the cost) do not depend on which implementation we use. Therefore, multipartite entanglement cannot be considered as the key resource for the quantum speed-up. We thus argue that the correct interpretation of our result is the evidence that the resource for speed up in query complexity is quantum coherence as captured by the \({\text {TS}}\) . However, multipartite entanglement is crucial to reduce other costs such as space or energy 6 . We point out that the interpretation of the \({\text {TS}}\) as quantum coherence holds for any implementation of the algorithm.

The role of quantum coherence during the noiseless Grover’s algorithm has already been investigated in Refs. 46 , 47 . These works found a one-to-one correspondence between the \(l_1\) -norm of coherence which is decreasing during the algorithm and the increasing success probability. Both approaches have not been generalized to mixed state versions of the algorithm. In our case, a different measure of coherence, namely the \({\text {TS}}\) , is connected to the average cost of the algorithm. It reaches its maximal value during the algorithm and offers a physical resource also for pseudo-pure generalizations. In Refs. 46 , 47 , the \(l_1\) -norm of coherence and the relative entropy of coherence are used which detect different states as highly coherent as \({\text {TS}}\) would. For instance, while the \(l_1\) -norm detects the initial state \(\left| \psi _{\rm {in}} \right\rangle =1/\sqrt{2^n}\sum _x \left| x \right\rangle\) as maximally coherent, \({\text {TS}}\) would detect \((\left| 0 \right\rangle ^{\otimes n}+\left| 1 \right\rangle ^{\otimes n})/\sqrt{2}\) as maximally coherent. For a discussion of these so-called speakable and unspeakable coherence, see for instance Ref. 55 .

Finally, we emphasize that the \({\text {TS}}\) can be measured or efficiently bounded experimentally. Following Refs. 56 , 57 , one measures the Kolmogorov distance between the probability distribution of \(\rho (0)\) and \(\rho (t)\) , for a given measurement observable. A quadratic series expansion of the Kolmogorov distance for sufficiently small t yields the Kolmogorov speed which is a lower bound to the TS and depends on the considered measurement observable. The TS is obtained by maximizing the Kolomogorov speed over all possible observables 1 .

To conclude, the analysis of the \({\text {TS}}\) might inspire further investigations of the still unanswered search for the origins and quantification of quantum advantage. In particular, one could check the importance of the \({\text {TS}}\) and other quantum statistical speeds for other oracle-based quantum algorithms such as, e.g., the Deutsch-Jozsa algorithm or Simon’s algorithm, or general quantum technology tasks. Also, whether or not the \({\text {TS}}\) is a necessary resource in different noisy variations of Grover’s algorithm, merits further investigation. More general dephasing models or unitary noise could be considered that render the analysis more cumbersome. Overall, our results suggest that quantum statistical speeds can be used to recognize useful properties of quantum states for different quantum technology tasks.

Cost dependence for small polarizations

As discussed in the main text, for initial polarizations \(\epsilon \sim 1/2^{n}\) , a classical search is less costly than performing the Grover iterations. Here, we discuss the behavior of the cost in this regime. We observe numerically that for polarizations \(\epsilon\) above a critical value \(\epsilon _c\) , the minimum of the cost Eq. ( 2 ) is always obtained after \({\tilde{k}}_{\rm {Gr}}= r \sqrt{2^n}\) iteration steps, where the cost is given by \(C_{\rm {qu}}=K\sqrt{2^n}/\epsilon\) ( \(K\approx 0.69\) , see main text). For \(\epsilon <\epsilon _c\) , the cost is minimized for \(k=0\) iterations, that is, performing a classical search with \(C_{\rm {qu}}=2^n\) . The exact value of \(\epsilon _c\) is given by the equality of the costs, \(K\sqrt{2^n}/\epsilon =2^n\) , yielding

Note that for small n , the minimization of the cost has to be performed on a discrete grid. Therefore, a case study for each n has to be performed.

Optimized trace speed during the pure-state algorithm

Here, we describe the derivation of the analytical formula for the optimized trace speed during the algorithm. Since the evolution of the algorithm is restricted to the symmetric subspace, the optimized trace speed \({\text {TS}}\) during the algorithm is given by the square root of the largest eigenvalue of \(\Gamma _{ij}=4\left( {\text {Re}}[\left\langle J_iJ_j\right\rangle ]-\left\langle J_i\right\rangle \left\langle J_j\right\rangle \right)\) 29 , where the expectation value \(\langle \cdot \rangle\) is computed with respect to the state state \(\left| \psi _k \right\rangle = \sin [(2k+1)\theta ]\left| \omega \right\rangle +\cos [(2k+1)\theta ]\left| \omega ^\perp \right\rangle\) , cf. Eq. ( 1 ). The computation is straightforward, for instance, for \(J_x =\sum _{i=1}^n\sigma ^{(i)}_x/2\) , one finds that

Eventually, we obtain

where we defined \(\theta _k = (2k+1)\theta\) . Taking the square root of the largest eigenvalue and neglecting terms of order \({\mathcal {O}}(1/2^n)\) , we find the optimized \({\text {TS}}\) after the k -th step of the pure-state algorithm as

with \(f(k)=\cos [4(2k+1)\theta ]\) .

Partial depolarization

Here, we discuss the case of an interruption of the search algorithm at an early step \(k\le k_{\rm {Gr}}/2\) , i.e., at a step where the pure state algorithm would not have reached its maximal \({\text {TS}}\) yet. The cost for stopping the algorithm at step k is given by \(C_{\rm {qu}}(k)=(k+1)/(\epsilon (k)\sin ^2((2k+1)\theta ))\) , see Eq. ( 1 ) in the main text. The \({\text {TS}}\) at step k still fulfills \({\text {TS}}(k)=\epsilon (k){\text {TS}}_{\rm {pure}}(k)\) , where \({\text {TS}}_{\rm {pure}}(k)\) is the \({\text {TS}}\) in the pure state algorithm, see Fig.  1 in the main text. Therefore, we have

where \({\text {TS}}_{\rm {max}}\) is the maximal \({\text {TS}}\) until the interruption step k and we regrouped all other factors into a ( k ). To further examine this expression, we use the exact form of the pure state \({\text {TS}}\) , \({\text {TS}}_{\rm {pure}}(k)\) , cf. Eq. ( 13 ). We can then compare the factor a ( k ) with the factor \(b=K\sqrt{2^n}\sqrt{n(n+1)/2}\) from the current bound, Eq. ( 9 ) in the main text. By writing \(x=k/\sqrt{2^n}\) , one finds for large n

For \(0\le x\le \pi /8\) ( \(0\le k\le \pi /8 \sqrt{2^n} = k_{\rm {Gr}}/2\) ) and using \(K\approx 0.69\) , one finds that \(a(k)-b>0\) . Therefore, the bound of Eq. ( 9 ) still holds for the regime \(k\le k_{\rm {Gr}}/2\) .

Nielsen, M. A. & Chuang, I. Quantum Computation and Quantum Information (Cambridge University Press, Cambridge, 2000).

MATH   Google Scholar  

Kaye, P. et al. An Introduction to Quantum Computing (Oxford University Press, Oxford, 2007).

Vedral, V. The elusive source of quantum speedup. Found. Phys. 40 , 1141–1154 (2010).

Article   ADS   MathSciNet   MATH   Google Scholar  

Jozsa, R. Entanglement and quantum computation. Preprint at http arXiv:9707034 (1997).

Ekert, A. & Jozsa, R. Quantum algorithms: Entanglement-enhanced information processing. Phil. Trans. R. Soc. Lond. A 356 , 1769–1782 (1998).

Lloyd, S. Quantum search without entanglement. Phys. Rev. A 61 , 010301(R) (1999).

Article   MathSciNet   Google Scholar  

Linden, N. & Popescu, S. Good dynamics versus bad kinematics: Is entanglement needed for quantum computation?. Phys. Rev. Lett. 87 , 047901 (2001).

Article   ADS   CAS   PubMed   Google Scholar  

Jozsa, R. & Linden, N. On the role of entanglement in quantum-computational speed-up. Proc. Roy. Soc. A 459 , 2011–2032 (2003).

Vidal, G. Efficient classical simulation of slightly entangled quantum computations. Phys. Rev. Lett. 91 , 147902 (2003).

Article   ADS   PubMed   CAS   Google Scholar  

Biham, E., Brassard, G., Kenigsberg, D. & Mor, T. Quantum computing without entanglement. Theor. Comp. Sci. 320 , 15–33 (2004).

Article   MathSciNet   MATH   Google Scholar  

Kenigsberg, D., Mor, T. & Ratsaby, G. Quantum advantage without entanglement. Quantum Inf. Comp. 6 , 606–615 (2006).

MathSciNet   MATH   Google Scholar  

Horodecki, R., Horodecki, P., Horodecki, M. & Horodecki, K. Quantum entanglement. Rev. Mod. Phys. 81 , 865–942 (2009).

Article   ADS   MathSciNet   CAS   MATH   Google Scholar  

Van den Nest, M. Universal quantum computation with little entanglement. Phys. Rev. Lett. 110 , 060504 (2013).

Article   PubMed   CAS   Google Scholar  

Bernstein, E. & Vazirani, U. Quantum complexity theory. SIAM J. Comp. 26 , 1411–1473 (1997).

Datta, A., Shaji, A. & Caves, C. M. Quantum discord and the power of one qubit. Phys. Rev. Lett. 100 , 050502 (2008).

Hillery, M. Coherence as a resource in decision problems: The deutsch-jozsa algorithm and a variation. Phys. Rev. A 93 , 012111 (2016).

Article   ADS   CAS   Google Scholar  

Ma, J., Yadin, B., Girolami, D., Vedral, V. & Gu, M. Converting coherence to quantum correlations. Phys. Rev. Lett. 116 , 160407 (2016).

Matera, J. M., Egloff, D., Killoran, N. & Plenio, M. B. Coherent control of quantum systems as a resource theory. Quantum Sci. Technol. 1 , 01LT01 (2016).

Article   Google Scholar  

Howard, M., Wallman, J., Veitch, V. & Emerson, J. Contextuality supplies the magic for quantum computation. Nature 510 , 351–355 (2014).

Cai, Y., Le, H. N. & Scarani, V. State complexity and quantum computation. Ann. Phys. 527 , 684–700 (2015).

Stahlke, D. Quantum interference as a resource for quantum speedup. Phys. Rev. A 90 , 022302 (2014).

Wootters, W. K. Statistical distance and hilbert space. Phys. Rev. D 23 , 357–362 (1981).

Article   ADS   MathSciNet   Google Scholar  

Petz, D. Monotone metrics on matrix spaces. Linear Algebra Appl. 244 , 81–96 (1996).

Spehner, D. Quantum correlations and distinguishability of quantum states. J. Math. Phys. 55 , 075211 (2014).

Braunstein, S. L. & Caves, C. M. Statistical distance and the geometry of quantum states. Phys. Rev. Lett. 72 , 3439–3443 (1994).

Article   ADS   MathSciNet   CAS   PubMed   MATH   Google Scholar  

Gessner, M. & Smerzi, A. Statistical speed of quantum states: Generalized quantum fisher information and schatten speed. Phys. Rev. A 97 , 022109 (2018).

Article   ADS   MathSciNet   CAS   Google Scholar  

Helstrom, C. W. Quantum Detection and Estimation Theory (Academic Press, New York, 1976).

Pezzé, L. & Smerzi, A. Entanglement, nonlinear dynamics, and the heisenberg limit. Phys. Rev. Lett. 102 , 100401 (2009).

Article   ADS   MathSciNet   PubMed   CAS   Google Scholar  

Hyllus, P. et al. Fisher information and multiparticle entanglement. Phys. Rev. A 85 , 022321 (2012).

Tóth, G. Multipartite entanglement and high-precision metrology. Phys. Rev. A 85 , 022322 (2012).

Pezzè, L., Smerzi, A., Oberthaler, M. K., Schmied, R. & Treutlein, P. Quantum metrology with nonclassical states of atomic ensembles. Rev. Mod. Phys. 90 , 035005 (2016).

Tóth, G. & Apellaniz, I. Quantum metrology from a quantum information science perspective. J. Phys. A 47 , 424006 (2014).

Grover, L. K. Quantum mechanics helps in searching for a needle in a haystack. Phys. Rev. Lett. 79 , 325 (1997).

Braunstein, S. L. & Pati, A. K. Speed-up and entanglement in quantum searching. Quantum Inform. Comput. 2 , 399 (2002).

Marvian, I. & Spekkens, R. W. Extending noether’s theorem by quantifying the asymmetry of quantum states. Nat. Commun. 5 , 3821 (2014).

Streltsov, A., Adesso, G. & Plenio, M. B. Colloquium: Quantum coherence as a resource. Rev. Mod. Phys. 89 , 041003 (2017).

Zalka, C. Grover’s quantum searching algorithm is optimal. Phys. Rev. A 60 , 2746–2751 (1999).

Bennett, C. H., Bernstein, E., Brassard, G. & Vazirani, U. Strengths and weaknesses of quantum computing. SIAM J. Comp. 26 , 1510–1523 (1997).

Caves, C. M., Deutsch, I. H. & Blume-Kohout, R. Physical-resource requirements and the power of quantum computation. J. Opt. B: Quantum Semiclass. Opt. 6 , S801–S806 (2004).

Bhattacharya, N., van Linden van den Heuvell, H. B. & Spreeuw, R. J. C. Implementation of quantum search algorithm using classical fourier optics. Phys. Rev. Lett. 88 , 137901 (2002).

Bruß, D. & Macchiavello, C. Multipartite entanglement in quantum algorithms. Phys. Rev. A 83 , 052313 (2011).

Meyer, D. A. & Wallach, N. R. Global entanglement in multiparticle systems. J. Math. Phys. 43 , 4273–4278 (2002).

Fang, Y. et al. Entanglement in the grover search algorithm. Phys. Lett. A 345 , 265–272 (2005).

Article   ADS   CAS   MATH   Google Scholar  

Rungta, P. The quadratic speedup in grovers search algorithm from the entanglement perspective. Phys. Lett. A 373 , 2652–2659 (2009).

Rossi, M., Bruß, D. & Macchiavello, C. Scale invariance of entanglement dynamics in grovers quantum search algorithm. Phys. Rev. A 87 , 022331 (2013).

Shi, H.-L. et al. Coherence depletion in the grover quantum search algorithm. Phys. Rev. A 95 , 032307 (2017).

Anand, N. & Pati, A. K. Coherence and entanglement monogamy in the discrete analogue of analog grover search. Preprint at arXiv:1611.04542 (2016).

Pan, M. & Qiu, D. Operator coherence dynamics in grovers quantum search algorithm. Phys. Rev. A 100 , 012349 (2019).

Gory, D. G., Fahmy, A. F. & Havel, T. F. Ensemble quantum computing by nuclear magnetic resonance spectroscopy. Proc. Natl. Acad. Sci. USA 94 , 1634 (1997).

Article   ADS   Google Scholar  

Biham, E. & Kenigsberg, D. Grovers quantum search algorithm for an arbitrary initial mixed state. Phys. Rev. A 66 , 062301 (2002).

Kay, A. Degree of quantum correlation required to speed up a computation. Phys. Rev. A 92 , 062329 (2015).

Cohn, I., De Oliveira, A. L. F., Buksman, E. & De Lacalle, J. G. L. Grover’s search with local and total depolarizing channel errors: Complexity analysis. Int. J. Quantum Inf. 14 , 1650009 (2016).

Vrana, P., Reeb, D., Reitzner, D. & Wolf, M. M. Fault-ignorant quantum search. New J. Phys. 16 , 073033 (2014).

Wilde, M. M. Quantum Information Theory (Cambridge University Press, Cambridge, 2013).

Book   MATH   Google Scholar  

Marvian, I. & Spekkens, R. W. How to quantify coherence: Distinguishing speakable and unspeakable notions. Phys. Rev. A 94 , 052324 (2016).

Strobel, H. et al. Fisher information and entanglement of non-Gaussian spin states. Science 345 , 424–427 (2014).

Pezzè, L., Li, Y., Li, W. & Smerzi, A. Witnessing entanglement without entanglement witness operators. Proc. Natl. Acad. Sci. USA 113 , 11459–11464 (2016).

Article   ADS   PubMed   CAS   PubMed Central   Google Scholar  

Download references

Acknowledgements

The authors acknowledge financial support from the European Union’s Horizon 2020 research and innovation programme—Qombs Project, FET Flagship on Quantum Technologies Grant no. 820419.

Author information

Authors and affiliations.

QSTAR, INO-CNR and LENS, Largo Enrico Fermi 2, 50125, Firenze, Italy

Valentin Gebhart, Luca Pezzè & Augusto Smerzi

Università degli Studi di Napoli Federico II, Via Cinthia 21, 80126, Napoli, Italy

Valentin Gebhart

You can also search for this author in PubMed   Google Scholar

Contributions

V.G., L.P. and A.S. all contributed equally to this work.

Corresponding author

Correspondence to Valentin Gebhart .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Gebhart, V., Pezzè, L. & Smerzi, A. Quantifying computational advantage of Grover’s algorithm with the trace speed. Sci Rep 11 , 1288 (2021). https://doi.org/10.1038/s41598-020-80153-z

Download citation

Received : 29 April 2020

Accepted : 09 November 2020

Published : 14 January 2021

DOI : https://doi.org/10.1038/s41598-020-80153-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Decoherence in grover search algorithm.

Quantum Information Processing (2024)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

search algorithms research papers

Advertisement

Advertisement

Machine Learning: Algorithms, Real-World Applications and Research Directions

  • Review Article
  • Published: 22 March 2021
  • Volume 2 , article number  160 , ( 2021 )

Cite this article

search algorithms research papers

  • Iqbal H. Sarker   ORCID: orcid.org/0000-0003-1740-5517 1 , 2  

584k Accesses

1813 Citations

50 Altmetric

Explore all metrics

In the current age of the Fourth Industrial Revolution (4 IR or Industry 4.0), the digital world has a wealth of data, such as Internet of Things (IoT) data, cybersecurity data, mobile data, business data, social media data, health data, etc. To intelligently analyze these data and develop the corresponding smart and automated  applications, the knowledge of artificial intelligence (AI), particularly, machine learning (ML) is the key. Various types of machine learning algorithms such as supervised, unsupervised, semi-supervised, and reinforcement learning exist in the area. Besides, the deep learning , which is part of a broader family of machine learning methods, can intelligently analyze the data on a large scale. In this paper, we present a comprehensive view on these machine learning algorithms that can be applied to enhance the intelligence and the capabilities of an application. Thus, this study’s key contribution is explaining the principles of different machine learning techniques and their applicability in various real-world application domains, such as cybersecurity systems, smart cities, healthcare, e-commerce, agriculture, and many more. We also highlight the challenges and potential research directions based on our study. Overall, this paper aims to serve as a reference point for both academia and industry professionals as well as for decision-makers in various real-world situations and application areas, particularly from the technical point of view.

Similar content being viewed by others

search algorithms research papers

Machine Learning Approaches for Smart City Applications: Emergence, Challenges and Opportunities

search algorithms research papers

Insights into the Advancements of Artificial Intelligence and Machine Learning, the Present State of Art, and Future Prospects: Seven Decades of Digital Revolution

search algorithms research papers

Editorial: Machine Learning, Advances in Computing, Renewable Energy and Communication (MARC)

Explore related subjects.

  • Artificial Intelligence

Avoid common mistakes on your manuscript.

Introduction

We live in the age of data, where everything around us is connected to a data source, and everything in our lives is digitally recorded [ 21 , 103 ]. For instance, the current electronic world has a wealth of various kinds of data, such as the Internet of Things (IoT) data, cybersecurity data, smart city data, business data, smartphone data, social media data, health data, COVID-19 data, and many more. The data can be structured, semi-structured, or unstructured, discussed briefly in Sect. “ Types of Real-World Data and Machine Learning Techniques ”, which is increasing day-by-day. Extracting insights from these data can be used to build various intelligent applications in the relevant domains. For instance, to build a data-driven automated and intelligent cybersecurity system, the relevant cybersecurity data can be used [ 105 ]; to build personalized context-aware smart mobile applications, the relevant mobile data can be used [ 103 ], and so on. Thus, the data management tools and techniques having the capability of extracting insights or useful knowledge from the data in a timely and intelligent way is urgently needed, on which the real-world applications are based.

figure 1

The worldwide popularity score of various types of ML algorithms (supervised, unsupervised, semi-supervised, and reinforcement) in a range of 0 (min) to 100 (max) over time where x-axis represents the timestamp information and y-axis represents the corresponding score

Artificial intelligence (AI), particularly, machine learning (ML) have grown rapidly in recent years in the context of data analysis and computing that typically allows the applications to function in an intelligent manner [ 95 ]. ML usually provides systems with the ability to learn and enhance from experience automatically without being specifically programmed and is generally referred to as the most popular latest technologies in the fourth industrial revolution (4 IR or Industry 4.0) [ 103 , 105 ]. “Industry 4.0” [ 114 ] is typically the ongoing automation of conventional manufacturing and industrial practices, including exploratory data processing, using new smart technologies such as machine learning automation. Thus, to intelligently analyze these data and to develop the corresponding real-world applications, machine learning algorithms is the key. The learning algorithms can be categorized into four major types, such as supervised, unsupervised, semi-supervised, and reinforcement learning in the area [ 75 ], discussed briefly in Sect. “ Types of Real-World Data and Machine Learning Techniques ”. The popularity of these approaches to learning is increasing day-by-day, which is shown in Fig. 1 , based on data collected from Google Trends [ 4 ] over the last five years. The x - axis of the figure indicates the specific dates and the corresponding popularity score within the range of \(0 \; (minimum)\) to \(100 \; (maximum)\) has been shown in y - axis . According to Fig. 1 , the popularity indication values for these learning types are low in 2015 and are increasing day by day. These statistics motivate us to study on machine learning in this paper, which can play an important role in the real-world through Industry 4.0 automation.

In general, the effectiveness and the efficiency of a machine learning solution depend on the nature and characteristics of data and the performance of the learning algorithms . In the area of machine learning algorithms, classification analysis, regression, data clustering, feature engineering and dimensionality reduction, association rule learning, or reinforcement learning techniques exist to effectively build data-driven systems [ 41 , 125 ]. Besides, deep learning originated from the artificial neural network that can be used to intelligently analyze data, which is known as part of a wider family of machine learning approaches [ 96 ]. Thus, selecting a proper learning algorithm that is suitable for the target application in a particular domain is challenging. The reason is that the purpose of different learning algorithms is different, even the outcome of different learning algorithms in a similar category may vary depending on the data characteristics [ 106 ]. Thus, it is important to understand the principles of various machine learning algorithms and their applicability to apply in various real-world application areas, such as IoT systems, cybersecurity services, business and recommendation systems, smart cities, healthcare and COVID-19, context-aware systems, sustainable agriculture, and many more that are explained briefly in Sect. “ Applications of Machine Learning ”.

Based on the importance and potentiality of “Machine Learning” to analyze the data mentioned above, in this paper, we provide a comprehensive view on various types of machine learning algorithms that can be applied to enhance the intelligence and the capabilities of an application. Thus, the key contribution of this study is explaining the principles and potentiality of different machine learning techniques, and their applicability in various real-world application areas mentioned earlier. The purpose of this paper is, therefore, to provide a basic guide for those academia and industry people who want to study, research, and develop data-driven automated and intelligent systems in the relevant areas based on machine learning techniques.

The key contributions of this paper are listed as follows:

To define the scope of our study by taking into account the nature and characteristics of various types of real-world data and the capabilities of various learning techniques.

To provide a comprehensive view on machine learning algorithms that can be applied to enhance the intelligence and capabilities of a data-driven application.

To discuss the applicability of machine learning-based solutions in various real-world application domains.

To highlight and summarize the potential research directions within the scope of our study for intelligent data analysis and services.

The rest of the paper is organized as follows. The next section presents the types of data and machine learning algorithms in a broader sense and defines the scope of our study. We briefly discuss and explain different machine learning algorithms in the subsequent section followed by which various real-world application areas based on machine learning algorithms are discussed and summarized. In the penultimate section, we highlight several research issues and potential future directions, and the final section concludes this paper.

Types of Real-World Data and Machine Learning Techniques

Machine learning algorithms typically consume and process data to learn the related patterns about individuals, business processes, transactions, events, and so on. In the following, we discuss various types of real-world data as well as categories of machine learning algorithms.

Types of Real-World Data

Usually, the availability of data is considered as the key to construct a machine learning model or data-driven real-world systems [ 103 , 105 ]. Data can be of various forms, such as structured, semi-structured, or unstructured [ 41 , 72 ]. Besides, the “metadata” is another type that typically represents data about the data. In the following, we briefly discuss these types of data.

Structured: It has a well-defined structure, conforms to a data model following a standard order, which is highly organized and easily accessed, and used by an entity or a computer program. In well-defined schemes, such as relational databases, structured data are typically stored, i.e., in a tabular format. For instance, names, dates, addresses, credit card numbers, stock information, geolocation, etc. are examples of structured data.

Unstructured: On the other hand, there is no pre-defined format or organization for unstructured data, making it much more difficult to capture, process, and analyze, mostly containing text and multimedia material. For example, sensor data, emails, blog entries, wikis, and word processing documents, PDF files, audio files, videos, images, presentations, web pages, and many other types of business documents can be considered as unstructured data.

Semi-structured: Semi-structured data are not stored in a relational database like the structured data mentioned above, but it does have certain organizational properties that make it easier to analyze. HTML, XML, JSON documents, NoSQL databases, etc., are some examples of semi-structured data.

Metadata: It is not the normal form of data, but “data about data”. The primary difference between “data” and “metadata” is that data are simply the material that can classify, measure, or even document something relative to an organization’s data properties. On the other hand, metadata describes the relevant data information, giving it more significance for data users. A basic example of a document’s metadata might be the author, file size, date generated by the document, keywords to define the document, etc.

In the area of machine learning and data science, researchers use various widely used datasets for different purposes. These are, for example, cybersecurity datasets such as NSL-KDD [ 119 ], UNSW-NB15 [ 76 ], ISCX’12 [ 1 ], CIC-DDoS2019 [ 2 ], Bot-IoT [ 59 ], etc., smartphone datasets such as phone call logs [ 84 , 101 ], SMS Log [ 29 ], mobile application usages logs [ 137 ] [ 117 ], mobile phone notification logs [ 73 ] etc., IoT data [ 16 , 57 , 62 ], agriculture and e-commerce data [ 120 , 138 ], health data such as heart disease [ 92 ], diabetes mellitus [ 83 , 134 ], COVID-19 [ 43 , 74 ], etc., and many more in various application domains. The data can be in different types discussed above, which may vary from application to application in the real world. To analyze such data in a particular problem domain, and to extract the insights or useful knowledge from the data for building the real-world intelligent applications, different types of machine learning techniques can be used according to their learning capabilities, which is discussed in the following.

Types of Machine Learning Techniques

Machine Learning algorithms are mainly divided into four categories: Supervised learning, Unsupervised learning, Semi-supervised learning, and Reinforcement learning [ 75 ], as shown in Fig. 2 . In the following, we briefly discuss each type of learning technique with the scope of their applicability to solve real-world problems.

figure 2

Various types of machine learning techniques

Supervised: Supervised learning is typically the task of machine learning to learn a function that maps an input to an output based on sample input-output pairs [ 41 ]. It uses labeled training data and a collection of training examples to infer a function. Supervised learning is carried out when certain goals are identified to be accomplished from a certain set of inputs [ 105 ], i.e., a task-driven approach . The most common supervised tasks are “classification” that separates the data, and “regression” that fits the data. For instance, predicting the class label or sentiment of a piece of text, like a tweet or a product review, i.e., text classification, is an example of supervised learning.

Unsupervised: Unsupervised learning analyzes unlabeled datasets without the need for human interference, i.e., a data-driven process [ 41 ]. This is widely used for extracting generative features, identifying meaningful trends and structures, groupings in results, and exploratory purposes. The most common unsupervised learning tasks are clustering, density estimation, feature learning, dimensionality reduction, finding association rules, anomaly detection, etc.

Semi-supervised: Semi-supervised learning can be defined as a hybridization of the above-mentioned supervised and unsupervised methods, as it operates on both labeled and unlabeled data [ 41 , 105 ]. Thus, it falls between learning “without supervision” and learning “with supervision”. In the real world, labeled data could be rare in several contexts, and unlabeled data are numerous, where semi-supervised learning is useful [ 75 ]. The ultimate goal of a semi-supervised learning model is to provide a better outcome for prediction than that produced using the labeled data alone from the model. Some application areas where semi-supervised learning is used include machine translation, fraud detection, labeling data and text classification.

Reinforcement: Reinforcement learning is a type of machine learning algorithm that enables software agents and machines to automatically evaluate the optimal behavior in a particular context or environment to improve its efficiency [ 52 ], i.e., an environment-driven approach . This type of learning is based on reward or penalty, and its ultimate goal is to use insights obtained from environmental activists to take action to increase the reward or minimize the risk [ 75 ]. It is a powerful tool for training AI models that can help increase automation or optimize the operational efficiency of sophisticated systems such as robotics, autonomous driving tasks, manufacturing and supply chain logistics, however, not preferable to use it for solving the basic or straightforward problems.

Thus, to build effective models in various application areas different types of machine learning techniques can play a significant role according to their learning capabilities, depending on the nature of the data discussed earlier, and the target outcome. In Table 1 , we summarize various types of machine learning techniques with examples. In the following, we provide a comprehensive view of machine learning algorithms that can be applied to enhance the intelligence and capabilities of a data-driven application.

Machine Learning Tasks and Algorithms

In this section, we discuss various machine learning algorithms that include classification analysis, regression analysis, data clustering, association rule learning, feature engineering for dimensionality reduction, as well as deep learning methods. A general structure of a machine learning-based predictive model has been shown in Fig. 3 , where the model is trained from historical data in phase 1 and the outcome is generated in phase 2 for the new test data.

figure 3

A general structure of a machine learning based predictive model considering both the training and testing phase

Classification Analysis

Classification is regarded as a supervised learning method in machine learning, referring to a problem of predictive modeling as well, where a class label is predicted for a given example [ 41 ]. Mathematically, it maps a function ( f ) from input variables ( X ) to output variables ( Y ) as target, label or categories. To predict the class of given data points, it can be carried out on structured or unstructured data. For example, spam detection such as “spam” and “not spam” in email service providers can be a classification problem. In the following, we summarize the common classification problems.

Binary classification: It refers to the classification tasks having two class labels such as “true and false” or “yes and no” [ 41 ]. In such binary classification tasks, one class could be the normal state, while the abnormal state could be another class. For instance, “cancer not detected” is the normal state of a task that involves a medical test, and “cancer detected” could be considered as the abnormal state. Similarly, “spam” and “not spam” in the above example of email service providers are considered as binary classification.

Multiclass classification: Traditionally, this refers to those classification tasks having more than two class labels [ 41 ]. The multiclass classification does not have the principle of normal and abnormal outcomes, unlike binary classification tasks. Instead, within a range of specified classes, examples are classified as belonging to one. For example, it can be a multiclass classification task to classify various types of network attacks in the NSL-KDD [ 119 ] dataset, where the attack categories are classified into four class labels, such as DoS (Denial of Service Attack), U2R (User to Root Attack), R2L (Root to Local Attack), and Probing Attack.

Multi-label classification: In machine learning, multi-label classification is an important consideration where an example is associated with several classes or labels. Thus, it is a generalization of multiclass classification, where the classes involved in the problem are hierarchically structured, and each example may simultaneously belong to more than one class in each hierarchical level, e.g., multi-level text classification. For instance, Google news can be presented under the categories of a “city name”, “technology”, or “latest news”, etc. Multi-label classification includes advanced machine learning algorithms that support predicting various mutually non-exclusive classes or labels, unlike traditional classification tasks where class labels are mutually exclusive [ 82 ].

Many classification algorithms have been proposed in the machine learning and data science literature [ 41 , 125 ]. In the following, we summarize the most common and popular methods that are used widely in various application areas.

Naive Bayes (NB): The naive Bayes algorithm is based on the Bayes’ theorem with the assumption of independence between each pair of features [ 51 ]. It works well and can be used for both binary and multi-class categories in many real-world situations, such as document or text classification, spam filtering, etc. To effectively classify the noisy instances in the data and to construct a robust prediction model, the NB classifier can be used [ 94 ]. The key benefit is that, compared to more sophisticated approaches, it needs a small amount of training data to estimate the necessary parameters and quickly [ 82 ]. However, its performance may affect due to its strong assumptions on features independence. Gaussian, Multinomial, Complement, Bernoulli, and Categorical are the common variants of NB classifier [ 82 ].

Linear Discriminant Analysis (LDA): Linear Discriminant Analysis (LDA) is a linear decision boundary classifier created by fitting class conditional densities to data and applying Bayes’ rule [ 51 , 82 ]. This method is also known as a generalization of Fisher’s linear discriminant, which projects a given dataset into a lower-dimensional space, i.e., a reduction of dimensionality that minimizes the complexity of the model or reduces the resulting model’s computational costs. The standard LDA model usually suits each class with a Gaussian density, assuming that all classes share the same covariance matrix [ 82 ]. LDA is closely related to ANOVA (analysis of variance) and regression analysis, which seek to express one dependent variable as a linear combination of other features or measurements.

Logistic regression (LR): Another common probabilistic based statistical model used to solve classification issues in machine learning is Logistic Regression (LR) [ 64 ]. Logistic regression typically uses a logistic function to estimate the probabilities, which is also referred to as the mathematically defined sigmoid function in Eq. 1 . It can overfit high-dimensional datasets and works well when the dataset can be separated linearly. The regularization (L1 and L2) techniques [ 82 ] can be used to avoid over-fitting in such scenarios. The assumption of linearity between the dependent and independent variables is considered as a major drawback of Logistic Regression. It can be used for both classification and regression problems, but it is more commonly used for classification.

K-nearest neighbors (KNN): K-Nearest Neighbors (KNN) [ 9 ] is an “instance-based learning” or non-generalizing learning, also known as a “lazy learning” algorithm. It does not focus on constructing a general internal model; instead, it stores all instances corresponding to training data in n -dimensional space. KNN uses data and classifies new data points based on similarity measures (e.g., Euclidean distance function) [ 82 ]. Classification is computed from a simple majority vote of the k nearest neighbors of each point. It is quite robust to noisy training data, and accuracy depends on the data quality. The biggest issue with KNN is to choose the optimal number of neighbors to be considered. KNN can be used both for classification as well as regression.

Support vector machine (SVM): In machine learning, another common technique that can be used for classification, regression, or other tasks is a support vector machine (SVM) [ 56 ]. In high- or infinite-dimensional space, a support vector machine constructs a hyper-plane or set of hyper-planes. Intuitively, the hyper-plane, which has the greatest distance from the nearest training data points in any class, achieves a strong separation since, in general, the greater the margin, the lower the classifier’s generalization error. It is effective in high-dimensional spaces and can behave differently based on different mathematical functions known as the kernel. Linear, polynomial, radial basis function (RBF), sigmoid, etc., are the popular kernel functions used in SVM classifier [ 82 ]. However, when the data set contains more noise, such as overlapping target classes, SVM does not perform well.

Decision tree (DT): Decision tree (DT) [ 88 ] is a well-known non-parametric supervised learning method. DT learning methods are used for both the classification and regression tasks [ 82 ]. ID3 [ 87 ], C4.5 [ 88 ], and CART [ 20 ] are well known for DT algorithms. Moreover, recently proposed BehavDT [ 100 ], and IntrudTree [ 97 ] by Sarker et al. are effective in the relevant application domains, such as user behavior analytics and cybersecurity analytics, respectively. By sorting down the tree from the root to some leaf nodes, as shown in Fig. 4 , DT classifies the instances. Instances are classified by checking the attribute defined by that node, starting at the root node of the tree, and then moving down the tree branch corresponding to the attribute value. For splitting, the most popular criteria are “gini” for the Gini impurity and “entropy” for the information gain that can be expressed mathematically as [ 82 ].

figure 4

An example of a decision tree structure

figure 5

An example of a random forest structure considering multiple decision trees

Random forest (RF): A random forest classifier [ 19 ] is well known as an ensemble classification technique that is used in the field of machine learning and data science in various application areas. This method uses “parallel ensembling” which fits several decision tree classifiers in parallel, as shown in Fig. 5 , on different data set sub-samples and uses majority voting or averages for the outcome or final result. It thus minimizes the over-fitting problem and increases the prediction accuracy and control [ 82 ]. Therefore, the RF learning model with multiple decision trees is typically more accurate than a single decision tree based model [ 106 ]. To build a series of decision trees with controlled variation, it combines bootstrap aggregation (bagging) [ 18 ] and random feature selection [ 11 ]. It is adaptable to both classification and regression problems and fits well for both categorical and continuous values.

Adaptive Boosting (AdaBoost): Adaptive Boosting (AdaBoost) is an ensemble learning process that employs an iterative approach to improve poor classifiers by learning from their errors. This is developed by Yoav Freund et al. [ 35 ] and also known as “meta-learning”. Unlike the random forest that uses parallel ensembling, Adaboost uses “sequential ensembling”. It creates a powerful classifier by combining many poorly performing classifiers to obtain a good classifier of high accuracy. In that sense, AdaBoost is called an adaptive classifier by significantly improving the efficiency of the classifier, but in some instances, it can trigger overfits. AdaBoost is best used to boost the performance of decision trees, base estimator [ 82 ], on binary classification problems, however, is sensitive to noisy data and outliers.

Extreme gradient boosting (XGBoost): Gradient Boosting, like Random Forests [ 19 ] above, is an ensemble learning algorithm that generates a final model based on a series of individual models, typically decision trees. The gradient is used to minimize the loss function, similar to how neural networks [ 41 ] use gradient descent to optimize weights. Extreme Gradient Boosting (XGBoost) is a form of gradient boosting that takes more detailed approximations into account when determining the best model [ 82 ]. It computes second-order gradients of the loss function to minimize loss and advanced regularization (L1 and L2) [ 82 ], which reduces over-fitting, and improves model generalization and performance. XGBoost is fast to interpret and can handle large-sized datasets well.

Stochastic gradient descent (SGD): Stochastic gradient descent (SGD) [ 41 ] is an iterative method for optimizing an objective function with appropriate smoothness properties, where the word ‘stochastic’ refers to random probability. This reduces the computational burden, particularly in high-dimensional optimization problems, allowing for faster iterations in exchange for a lower convergence rate. A gradient is the slope of a function that calculates a variable’s degree of change in response to another variable’s changes. Mathematically, the Gradient Descent is a convex function whose output is a partial derivative of a set of its input parameters. Let, \(\alpha\) is the learning rate, and \(J_i\) is the training example cost of \(i \mathrm{th}\) , then Eq. ( 4 ) represents the stochastic gradient descent weight update method at the \(j^\mathrm{th}\) iteration. In large-scale and sparse machine learning, SGD has been successfully applied to problems often encountered in text classification and natural language processing [ 82 ]. However, SGD is sensitive to feature scaling and needs a range of hyperparameters, such as the regularization parameter and the number of iterations.

Rule-based classification : The term rule-based classification can be used to refer to any classification scheme that makes use of IF-THEN rules for class prediction. Several classification algorithms such as Zero-R [ 125 ], One-R [ 47 ], decision trees [ 87 , 88 ], DTNB [ 110 ], Ripple Down Rule learner (RIDOR) [ 125 ], Repeated Incremental Pruning to Produce Error Reduction (RIPPER) [ 126 ] exist with the ability of rule generation. The decision tree is one of the most common rule-based classification algorithms among these techniques because it has several advantages, such as being easier to interpret; the ability to handle high-dimensional data; simplicity and speed; good accuracy; and the capability to produce rules for human clear and understandable classification [ 127 ] [ 128 ]. The decision tree-based rules also provide significant accuracy in a prediction model for unseen test cases [ 106 ]. Since the rules are easily interpretable, these rule-based classifiers are often used to produce descriptive models that can describe a system including the entities and their relationships.

figure 6

Classification vs. regression. In classification the dotted line represents a linear boundary that separates the two classes; in regression, the dotted line models the linear relationship between the two variables

Regression Analysis

Regression analysis includes several methods of machine learning that allow to predict a continuous ( y ) result variable based on the value of one or more ( x ) predictor variables [ 41 ]. The most significant distinction between classification and regression is that classification predicts distinct class labels, while regression facilitates the prediction of a continuous quantity. Figure 6 shows an example of how classification is different with regression models. Some overlaps are often found between the two types of machine learning algorithms. Regression models are now widely used in a variety of fields, including financial forecasting or prediction, cost estimation, trend analysis, marketing, time series estimation, drug response modeling, and many more. Some of the familiar types of regression algorithms are linear, polynomial, lasso and ridge regression, etc., which are explained briefly in the following.

Simple and multiple linear regression: This is one of the most popular ML modeling techniques as well as a well-known regression technique. In this technique, the dependent variable is continuous, the independent variable(s) can be continuous or discrete, and the form of the regression line is linear. Linear regression creates a relationship between the dependent variable ( Y ) and one or more independent variables ( X ) (also known as regression line) using the best fit straight line [ 41 ]. It is defined by the following equations:

where a is the intercept, b is the slope of the line, and e is the error term. This equation can be used to predict the value of the target variable based on the given predictor variable(s). Multiple linear regression is an extension of simple linear regression that allows two or more predictor variables to model a response variable, y, as a linear function [ 41 ] defined in Eq. 6 , whereas simple linear regression has only 1 independent variable, defined in Eq. 5 .

Polynomial regression: Polynomial regression is a form of regression analysis in which the relationship between the independent variable x and the dependent variable y is not linear, but is the polynomial degree of \(n^\mathrm{th}\) in x [ 82 ]. The equation for polynomial regression is also derived from linear regression (polynomial regression of degree 1) equation, which is defined as below:

Here, y is the predicted/target output, \(b_0, b_1,... b_n\) are the regression coefficients, x is an independent/ input variable. In simple words, we can say that if data are not distributed linearly, instead it is \(n^\mathrm{th}\) degree of polynomial then we use polynomial regression to get desired output.

LASSO and ridge regression: LASSO and Ridge regression are well known as powerful techniques which are typically used for building learning models in presence of a large number of features, due to their capability to preventing over-fitting and reducing the complexity of the model. The LASSO (least absolute shrinkage and selection operator) regression model uses L 1 regularization technique [ 82 ] that uses shrinkage, which penalizes “absolute value of magnitude of coefficients” ( L 1 penalty). As a result, LASSO appears to render coefficients to absolute zero. Thus, LASSO regression aims to find the subset of predictors that minimizes the prediction error for a quantitative response variable. On the other hand, ridge regression uses L 2 regularization [ 82 ], which is the “squared magnitude of coefficients” ( L 2 penalty). Thus, ridge regression forces the weights to be small but never sets the coefficient value to zero, and does a non-sparse solution. Overall, LASSO regression is useful to obtain a subset of predictors by eliminating less important features, and ridge regression is useful when a data set has “multicollinearity” which refers to the predictors that are correlated with other predictors.

Cluster Analysis

Cluster analysis, also known as clustering, is an unsupervised machine learning technique for identifying and grouping related data points in large datasets without concern for the specific outcome. It does grouping a collection of objects in such a way that objects in the same category, called a cluster, are in some sense more similar to each other than objects in other groups [ 41 ]. It is often used as a data analysis technique to discover interesting trends or patterns in data, e.g., groups of consumers based on their behavior. In a broad range of application areas, such as cybersecurity, e-commerce, mobile data processing, health analytics, user modeling and behavioral analytics, clustering can be used. In the following, we briefly discuss and summarize various types of clustering methods.

Partitioning methods: Based on the features and similarities in the data, this clustering approach categorizes the data into multiple groups or clusters. The data scientists or analysts typically determine the number of clusters either dynamically or statically depending on the nature of the target applications, to produce for the methods of clustering. The most common clustering algorithms based on partitioning methods are K-means [ 69 ], K-Mediods [ 80 ], CLARA [ 55 ] etc.

Density-based methods: To identify distinct groups or clusters, it uses the concept that a cluster in the data space is a contiguous region of high point density isolated from other such clusters by contiguous regions of low point density. Points that are not part of a cluster are considered as noise. The typical clustering algorithms based on density are DBSCAN [ 32 ], OPTICS [ 12 ] etc. The density-based methods typically struggle with clusters of similar density and high dimensionality data.

Hierarchical-based methods: Hierarchical clustering typically seeks to construct a hierarchy of clusters, i.e., the tree structure. Strategies for hierarchical clustering generally fall into two types: (i) Agglomerative—a “bottom-up” approach in which each observation begins in its cluster and pairs of clusters are combined as one, moves up the hierarchy, and (ii) Divisive—a “top-down” approach in which all observations begin in one cluster and splits are performed recursively, moves down the hierarchy, as shown in Fig 7 . Our earlier proposed BOTS technique, Sarker et al. [ 102 ] is an example of a hierarchical, particularly, bottom-up clustering algorithm.

Grid-based methods: To deal with massive datasets, grid-based clustering is especially suitable. To obtain clusters, the principle is first to summarize the dataset with a grid representation and then to combine grid cells. STING [ 122 ], CLIQUE [ 6 ], etc. are the standard algorithms of grid-based clustering.

Model-based methods: There are mainly two types of model-based clustering algorithms: one that uses statistical learning, and the other based on a method of neural network learning [ 130 ]. For instance, GMM [ 89 ] is an example of a statistical learning method, and SOM [ 22 ] [ 96 ] is an example of a neural network learning method.

Constraint-based methods: Constrained-based clustering is a semi-supervised approach to data clustering that uses constraints to incorporate domain knowledge. Application or user-oriented constraints are incorporated to perform the clustering. The typical algorithms of this kind of clustering are COP K-means [ 121 ], CMWK-Means [ 27 ], etc.

figure 7

A graphical interpretation of the widely-used hierarchical clustering (Bottom-up and top-down) technique

Many clustering algorithms have been proposed with the ability to grouping data in machine learning and data science literature [ 41 , 125 ]. In the following, we summarize the popular methods that are used widely in various application areas.

K-means clustering: K-means clustering [ 69 ] is a fast, robust, and simple algorithm that provides reliable results when data sets are well-separated from each other. The data points are allocated to a cluster in this algorithm in such a way that the amount of the squared distance between the data points and the centroid is as small as possible. In other words, the K-means algorithm identifies the k number of centroids and then assigns each data point to the nearest cluster while keeping the centroids as small as possible. Since it begins with a random selection of cluster centers, the results can be inconsistent. Since extreme values can easily affect a mean, the K-means clustering algorithm is sensitive to outliers. K-medoids clustering [ 91 ] is a variant of K-means that is more robust to noises and outliers.

Mean-shift clustering: Mean-shift clustering [ 37 ] is a nonparametric clustering technique that does not require prior knowledge of the number of clusters or constraints on cluster shape. Mean-shift clustering aims to discover “blobs” in a smooth distribution or density of samples [ 82 ]. It is a centroid-based algorithm that works by updating centroid candidates to be the mean of the points in a given region. To form the final set of centroids, these candidates are filtered in a post-processing stage to remove near-duplicates. Cluster analysis in computer vision and image processing are examples of application domains. Mean Shift has the disadvantage of being computationally expensive. Moreover, in cases of high dimension, where the number of clusters shifts abruptly, the mean-shift algorithm does not work well.

DBSCAN: Density-based spatial clustering of applications with noise (DBSCAN) [ 32 ] is a base algorithm for density-based clustering which is widely used in data mining and machine learning. This is known as a non-parametric density-based clustering technique for separating high-density clusters from low-density clusters that are used in model building. DBSCAN’s main idea is that a point belongs to a cluster if it is close to many points from that cluster. It can find clusters of various shapes and sizes in a vast volume of data that is noisy and contains outliers. DBSCAN, unlike k-means, does not require a priori specification of the number of clusters in the data and can find arbitrarily shaped clusters. Although k-means is much faster than DBSCAN, it is efficient at finding high-density regions and outliers, i.e., is robust to outliers.

GMM clustering: Gaussian mixture models (GMMs) are often used for data clustering, which is a distribution-based clustering algorithm. A Gaussian mixture model is a probabilistic model in which all the data points are produced by a mixture of a finite number of Gaussian distributions with unknown parameters [ 82 ]. To find the Gaussian parameters for each cluster, an optimization algorithm called expectation-maximization (EM) [ 82 ] can be used. EM is an iterative method that uses a statistical model to estimate the parameters. In contrast to k-means, Gaussian mixture models account for uncertainty and return the likelihood that a data point belongs to one of the k clusters. GMM clustering is more robust than k-means and works well even with non-linear data distributions.

Agglomerative hierarchical clustering: The most common method of hierarchical clustering used to group objects in clusters based on their similarity is agglomerative clustering. This technique uses a bottom-up approach, where each object is first treated as a singleton cluster by the algorithm. Following that, pairs of clusters are merged one by one until all clusters have been merged into a single large cluster containing all objects. The result is a dendrogram, which is a tree-based representation of the elements. Single linkage [ 115 ], Complete linkage [ 116 ], BOTS [ 102 ] etc. are some examples of such techniques. The main advantage of agglomerative hierarchical clustering over k-means is that the tree-structure hierarchy generated by agglomerative clustering is more informative than the unstructured collection of flat clusters returned by k-means, which can help to make better decisions in the relevant application areas.

Dimensionality Reduction and Feature Learning

In machine learning and data science, high-dimensional data processing is a challenging task for both researchers and application developers. Thus, dimensionality reduction which is an unsupervised learning technique, is important because it leads to better human interpretations, lower computational costs, and avoids overfitting and redundancy by simplifying models. Both the process of feature selection and feature extraction can be used for dimensionality reduction. The primary distinction between the selection and extraction of features is that the “feature selection” keeps a subset of the original features [ 97 ], while “feature extraction” creates brand new ones [ 98 ]. In the following, we briefly discuss these techniques.

Feature selection: The selection of features, also known as the selection of variables or attributes in the data, is the process of choosing a subset of unique features (variables, predictors) to use in building machine learning and data science model. It decreases a model’s complexity by eliminating the irrelevant or less important features and allows for faster training of machine learning algorithms. A right and optimal subset of the selected features in a problem domain is capable to minimize the overfitting problem through simplifying and generalizing the model as well as increases the model’s accuracy [ 97 ]. Thus, “feature selection” [ 66 , 99 ] is considered as one of the primary concepts in machine learning that greatly affects the effectiveness and efficiency of the target machine learning model. Chi-squared test, Analysis of variance (ANOVA) test, Pearson’s correlation coefficient, recursive feature elimination, are some popular techniques that can be used for feature selection.

Feature extraction: In a machine learning-based model or system, feature extraction techniques usually provide a better understanding of the data, a way to improve prediction accuracy, and to reduce computational cost or training time. The aim of “feature extraction” [ 66 , 99 ] is to reduce the number of features in a dataset by generating new ones from the existing ones and then discarding the original features. The majority of the information found in the original set of features can then be summarized using this new reduced set of features. For instance, principal components analysis (PCA) is often used as a dimensionality-reduction technique to extract a lower-dimensional space creating new brand components from the existing features in a dataset [ 98 ].

Many algorithms have been proposed to reduce data dimensions in the machine learning and data science literature [ 41 , 125 ]. In the following, we summarize the popular methods that are used widely in various application areas.

Variance threshold: A simple basic approach to feature selection is the variance threshold [ 82 ]. This excludes all features of low variance, i.e., all features whose variance does not exceed the threshold. It eliminates all zero-variance characteristics by default, i.e., characteristics that have the same value in all samples. This feature selection algorithm looks only at the ( X ) features, not the ( y ) outputs needed, and can, therefore, be used for unsupervised learning.

Pearson correlation: Pearson’s correlation is another method to understand a feature’s relation to the response variable and can be used for feature selection [ 99 ]. This method is also used for finding the association between the features in a dataset. The resulting value is \([-1, 1]\) , where \(-1\) means perfect negative correlation, \(+1\) means perfect positive correlation, and 0 means that the two variables do not have a linear correlation. If two random variables represent X and Y , then the correlation coefficient between X and Y is defined as [ 41 ]

ANOVA: Analysis of variance (ANOVA) is a statistical tool used to verify the mean values of two or more groups that differ significantly from each other. ANOVA assumes a linear relationship between the variables and the target and the variables’ normal distribution. To statistically test the equality of means, the ANOVA method utilizes F tests. For feature selection, the results ‘ANOVA F value’ [ 82 ] of this test can be used where certain features independent of the goal variable can be omitted.

Chi square: The chi-square \({\chi }^2\) [ 82 ] statistic is an estimate of the difference between the effects of a series of events or variables observed and expected frequencies. The magnitude of the difference between the real and observed values, the degrees of freedom, and the sample size depends on \({\chi }^2\) . The chi-square \({\chi }^2\) is commonly used for testing relationships between categorical variables. If \(O_i\) represents observed value and \(E_i\) represents expected value, then

Recursive feature elimination (RFE): Recursive Feature Elimination (RFE) is a brute force approach to feature selection. RFE [ 82 ] fits the model and removes the weakest feature before it meets the specified number of features. Features are ranked by the coefficients or feature significance of the model. RFE aims to remove dependencies and collinearity in the model by recursively removing a small number of features per iteration.

Model-based selection: To reduce the dimensionality of the data, linear models penalized with the L 1 regularization can be used. Least absolute shrinkage and selection operator (Lasso) regression is a type of linear regression that has the property of shrinking some of the coefficients to zero [ 82 ]. Therefore, that feature can be removed from the model. Thus, the penalized lasso regression method, often used in machine learning to select the subset of variables. Extra Trees Classifier [ 82 ] is an example of a tree-based estimator that can be used to compute impurity-based function importance, which can then be used to discard irrelevant features.

Principal component analysis (PCA): Principal component analysis (PCA) is a well-known unsupervised learning approach in the field of machine learning and data science. PCA is a mathematical technique that transforms a set of correlated variables into a set of uncorrelated variables known as principal components [ 48 , 81 ]. Figure 8 shows an example of the effect of PCA on various dimensions space, where Fig. 8 a shows the original features in 3D space, and Fig. 8 b shows the created principal components PC1 and PC2 onto a 2D plane, and 1D line with the principal component PC1 respectively. Thus, PCA can be used as a feature extraction technique that reduces the dimensionality of the datasets, and to build an effective machine learning model [ 98 ]. Technically, PCA identifies the completely transformed with the highest eigenvalues of a covariance matrix and then uses those to project the data into a new subspace of equal or fewer dimensions [ 82 ].

figure 8

An example of a principal component analysis (PCA) and created principal components PC1 and PC2 in different dimension space

Association Rule Learning

Association rule learning is a rule-based machine learning approach to discover interesting relationships, “IF-THEN” statements, in large datasets between variables [ 7 ]. One example is that “if a customer buys a computer or laptop (an item), s/he is likely to also buy anti-virus software (another item) at the same time”. Association rules are employed today in many application areas, including IoT services, medical diagnosis, usage behavior analytics, web usage mining, smartphone applications, cybersecurity applications, and bioinformatics. In comparison to sequence mining, association rule learning does not usually take into account the order of things within or across transactions. A common way of measuring the usefulness of association rules is to use its parameter, the ‘support’ and ‘confidence’, which is introduced in [ 7 ].

In the data mining literature, many association rule learning methods have been proposed, such as logic dependent [ 34 ], frequent pattern based [ 8 , 49 , 68 ], and tree-based [ 42 ]. The most popular association rule learning algorithms are summarized below.

AIS and SETM: AIS is the first algorithm proposed by Agrawal et al. [ 7 ] for association rule mining. The AIS algorithm’s main downside is that too many candidate itemsets are generated, requiring more space and wasting a lot of effort. This algorithm calls for too many passes over the entire dataset to produce the rules. Another approach SETM [ 49 ] exhibits good performance and stable behavior with execution time; however, it suffers from the same flaw as the AIS algorithm.

Apriori: For generating association rules for a given dataset, Agrawal et al. [ 8 ] proposed the Apriori, Apriori-TID, and Apriori-Hybrid algorithms. These later algorithms outperform the AIS and SETM mentioned above due to the Apriori property of frequent itemset [ 8 ]. The term ‘Apriori’ usually refers to having prior knowledge of frequent itemset properties. Apriori uses a “bottom-up” approach, where it generates the candidate itemsets. To reduce the search space, Apriori uses the property “all subsets of a frequent itemset must be frequent; and if an itemset is infrequent, then all its supersets must also be infrequent”. Another approach predictive Apriori [ 108 ] can also generate rules; however, it receives unexpected results as it combines both the support and confidence. The Apriori [ 8 ] is the widely applicable techniques in mining association rules.

ECLAT: This technique was proposed by Zaki et al. [ 131 ] and stands for Equivalence Class Clustering and bottom-up Lattice Traversal. ECLAT uses a depth-first search to find frequent itemsets. In contrast to the Apriori [ 8 ] algorithm, which represents data in a horizontal pattern, it represents data vertically. Hence, the ECLAT algorithm is more efficient and scalable in the area of association rule learning. This algorithm is better suited for small and medium datasets whereas the Apriori algorithm is used for large datasets.

FP-Growth: Another common association rule learning technique based on the frequent-pattern tree (FP-tree) proposed by Han et al. [ 42 ] is Frequent Pattern Growth, known as FP-Growth. The key difference with Apriori is that while generating rules, the Apriori algorithm [ 8 ] generates frequent candidate itemsets; on the other hand, the FP-growth algorithm [ 42 ] prevents candidate generation and thus produces a tree by the successful strategy of ‘divide and conquer’ approach. Due to its sophistication, however, FP-Tree is challenging to use in an interactive mining environment [ 133 ]. Thus, the FP-Tree would not fit into memory for massive data sets, making it challenging to process big data as well. Another solution is RARM (Rapid Association Rule Mining) proposed by Das et al. [ 26 ] but faces a related FP-tree issue [ 133 ].

ABC-RuleMiner: A rule-based machine learning method, recently proposed in our earlier paper, by Sarker et al. [ 104 ], to discover the interesting non-redundant rules to provide real-world intelligent services. This algorithm effectively identifies the redundancy in associations by taking into account the impact or precedence of the related contextual features and discovers a set of non-redundant association rules. This algorithm first constructs an association generation tree (AGT), a top-down approach, and then extracts the association rules through traversing the tree. Thus, ABC-RuleMiner is more potent than traditional rule-based methods in terms of both non-redundant rule generation and intelligent decision-making, particularly in a context-aware smart computing environment, where human or user preferences are involved.

Among the association rule learning techniques discussed above, Apriori [ 8 ] is the most widely used algorithm for discovering association rules from a given dataset [ 133 ]. The main strength of the association learning technique is its comprehensiveness, as it generates all associations that satisfy the user-specified constraints, such as minimum support and confidence value. The ABC-RuleMiner approach [ 104 ] discussed earlier could give significant results in terms of non-redundant rule generation and intelligent decision-making for the relevant application areas in the real world.

Reinforcement Learning

Reinforcement learning (RL) is a machine learning technique that allows an agent to learn by trial and error in an interactive environment using input from its actions and experiences. Unlike supervised learning, which is based on given sample data or examples, the RL method is based on interacting with the environment. The problem to be solved in reinforcement learning (RL) is defined as a Markov Decision Process (MDP) [ 86 ], i.e., all about sequentially making decisions. An RL problem typically includes four elements such as Agent, Environment, Rewards, and Policy.

RL can be split roughly into Model-based and Model-free techniques. Model-based RL is the process of inferring optimal behavior from a model of the environment by performing actions and observing the results, which include the next state and the immediate reward [ 85 ]. AlphaZero, AlphaGo [ 113 ] are examples of the model-based approaches. On the other hand, a model-free approach does not use the distribution of the transition probability and the reward function associated with MDP. Q-learning, Deep Q Network, Monte Carlo Control, SARSA (State–Action–Reward–State–Action), etc. are some examples of model-free algorithms [ 52 ]. The policy network, which is required for model-based RL but not for model-free, is the key difference between model-free and model-based learning. In the following, we discuss the popular RL algorithms.

Monte Carlo methods: Monte Carlo techniques, or Monte Carlo experiments, are a wide category of computational algorithms that rely on repeated random sampling to obtain numerical results [ 52 ]. The underlying concept is to use randomness to solve problems that are deterministic in principle. Optimization, numerical integration, and making drawings from the probability distribution are the three problem classes where Monte Carlo techniques are most commonly used.

Q-learning: Q-learning is a model-free reinforcement learning algorithm for learning the quality of behaviors that tell an agent what action to take under what conditions [ 52 ]. It does not need a model of the environment (hence the term “model-free”), and it can deal with stochastic transitions and rewards without the need for adaptations. The ‘Q’ in Q-learning usually stands for quality, as the algorithm calculates the maximum expected rewards for a given behavior in a given state.

Deep Q-learning: The basic working step in Deep Q-Learning [ 52 ] is that the initial state is fed into the neural network, which returns the Q-value of all possible actions as an output. Still, when we have a reasonably simple setting to overcome, Q-learning works well. However, when the number of states and actions becomes more complicated, deep learning can be used as a function approximator.

Reinforcement learning, along with supervised and unsupervised learning, is one of the basic machine learning paradigms. RL can be used to solve numerous real-world problems in various fields, such as game theory, control theory, operations analysis, information theory, simulation-based optimization, manufacturing, supply chain logistics, multi-agent systems, swarm intelligence, aircraft control, robot motion control, and many more.

Artificial Neural Network and Deep Learning

Deep learning is part of a wider family of artificial neural networks (ANN)-based machine learning approaches with representation learning. Deep learning provides a computational architecture by combining several processing layers, such as input, hidden, and output layers, to learn from data [ 41 ]. The main advantage of deep learning over traditional machine learning methods is its better performance in several cases, particularly learning from large datasets [ 105 , 129 ]. Figure 9 shows a general performance of deep learning over machine learning considering the increasing amount of data. However, it may vary depending on the data characteristics and experimental set up.

figure 9

Machine learning and deep learning performance in general with the amount of data

The most common deep learning algorithms are: Multi-layer Perceptron (MLP), Convolutional Neural Network (CNN, or ConvNet), Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) [ 96 ]. In the following, we discuss various types of deep learning methods that can be used to build effective data-driven models for various purposes.

figure 10

A structure of an artificial neural network modeling with multiple processing layers

MLP: The base architecture of deep learning, which is also known as the feed-forward artificial neural network, is called a multilayer perceptron (MLP) [ 82 ]. A typical MLP is a fully connected network consisting of an input layer, one or more hidden layers, and an output layer, as shown in Fig. 10 . Each node in one layer connects to each node in the following layer at a certain weight. MLP utilizes the “Backpropagation” technique [ 41 ], the most “fundamental building block” in a neural network, to adjust the weight values internally while building the model. MLP is sensitive to scaling features and allows a variety of hyperparameters to be tuned, such as the number of hidden layers, neurons, and iterations, which can result in a computationally costly model.

CNN or ConvNet: The convolution neural network (CNN) [ 65 ] enhances the design of the standard ANN, consisting of convolutional layers, pooling layers, as well as fully connected layers, as shown in Fig. 11 . As it takes the advantage of the two-dimensional (2D) structure of the input data, it is typically broadly used in several areas such as image and video recognition, image processing and classification, medical image analysis, natural language processing, etc. While CNN has a greater computational burden, without any manual intervention, it has the advantage of automatically detecting the important features, and hence CNN is considered to be more powerful than conventional ANN. A number of advanced deep learning models based on CNN can be used in the field, such as AlexNet [ 60 ], Xception [ 24 ], Inception [ 118 ], Visual Geometry Group (VGG) [ 44 ], ResNet [ 45 ], etc.

LSTM-RNN: Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) architecture used in the area of deep learning [ 38 ]. LSTM has feedback links, unlike normal feed-forward neural networks. LSTM networks are well-suited for analyzing and learning sequential data, such as classifying, processing, and predicting data based on time series data, which differentiates it from other conventional networks. Thus, LSTM can be used when the data are in a sequential format, such as time, sentence, etc., and commonly applied in the area of time-series analysis, natural language processing, speech recognition, etc.

figure 11

An example of a convolutional neural network (CNN or ConvNet) including multiple convolution and pooling layers

In addition to these most common deep learning methods discussed above, several other deep learning approaches [ 96 ] exist in the area for various purposes. For instance, the self-organizing map (SOM) [ 58 ] uses unsupervised learning to represent the high-dimensional data by a 2D grid map, thus achieving dimensionality reduction. The autoencoder (AE) [ 15 ] is another learning technique that is widely used for dimensionality reduction as well and feature extraction in unsupervised learning tasks. Restricted Boltzmann machines (RBM) [ 46 ] can be used for dimensionality reduction, classification, regression, collaborative filtering, feature learning, and topic modeling. A deep belief network (DBN) is typically composed of simple, unsupervised networks such as restricted Boltzmann machines (RBMs) or autoencoders, and a backpropagation neural network (BPNN) [ 123 ]. A generative adversarial network (GAN) [ 39 ] is a form of the network for deep learning that can generate data with characteristics close to the actual data input. Transfer learning is currently very common because it can train deep neural networks with comparatively low data, which is typically the re-use of a new problem with a pre-trained model [ 124 ]. A brief discussion of these artificial neural networks (ANN) and deep learning (DL) models are summarized in our earlier paper Sarker et al. [ 96 ].

Overall, based on the learning techniques discussed above, we can conclude that various types of machine learning techniques, such as classification analysis, regression, data clustering, feature selection and extraction, and dimensionality reduction, association rule learning, reinforcement learning, or deep learning techniques, can play a significant role for various purposes according to their capabilities. In the following section, we discuss several application areas based on machine learning algorithms.

Applications of Machine Learning

In the current age of the Fourth Industrial Revolution (4IR), machine learning becomes popular in various application areas, because of its learning capabilities from the past and making intelligent decisions. In the following, we summarize and discuss ten popular application areas of machine learning technology.

Predictive analytics and intelligent decision-making: A major application field of machine learning is intelligent decision-making by data-driven predictive analytics [ 21 , 70 ]. The basis of predictive analytics is capturing and exploiting relationships between explanatory variables and predicted variables from previous events to predict the unknown outcome [ 41 ]. For instance, identifying suspects or criminals after a crime has been committed, or detecting credit card fraud as it happens. Another application, where machine learning algorithms can assist retailers in better understanding consumer preferences and behavior, better manage inventory, avoiding out-of-stock situations, and optimizing logistics and warehousing in e-commerce. Various machine learning algorithms such as decision trees, support vector machines, artificial neural networks, etc. [ 106 , 125 ] are commonly used in the area. Since accurate predictions provide insight into the unknown, they can improve the decisions of industries, businesses, and almost any organization, including government agencies, e-commerce, telecommunications, banking and financial services, healthcare, sales and marketing, transportation, social networking, and many others.

Cybersecurity and threat intelligence: Cybersecurity is one of the most essential areas of Industry 4.0. [ 114 ], which is typically the practice of protecting networks, systems, hardware, and data from digital attacks [ 114 ]. Machine learning has become a crucial cybersecurity technology that constantly learns by analyzing data to identify patterns, better detect malware in encrypted traffic, find insider threats, predict where bad neighborhoods are online, keep people safe while browsing, or secure data in the cloud by uncovering suspicious activity. For instance, clustering techniques can be used to identify cyber-anomalies, policy violations, etc. To detect various types of cyber-attacks or intrusions machine learning classification models by taking into account the impact of security features are useful [ 97 ]. Various deep learning-based security models can also be used on the large scale of security datasets [ 96 , 129 ]. Moreover, security policy rules generated by association rule learning techniques can play a significant role to build a rule-based security system [ 105 ]. Thus, we can say that various learning techniques discussed in Sect. Machine Learning Tasks and Algorithms , can enable cybersecurity professionals to be more proactive inefficiently preventing threats and cyber-attacks.

Internet of things (IoT) and smart cities: Internet of Things (IoT) is another essential area of Industry 4.0. [ 114 ], which turns everyday objects into smart objects by allowing them to transmit data and automate tasks without the need for human interaction. IoT is, therefore, considered to be the big frontier that can enhance almost all activities in our lives, such as smart governance, smart home, education, communication, transportation, retail, agriculture, health care, business, and many more [ 70 ]. Smart city is one of IoT’s core fields of application, using technologies to enhance city services and residents’ living experiences [ 132 , 135 ]. As machine learning utilizes experience to recognize trends and create models that help predict future behavior and events, it has become a crucial technology for IoT applications [ 103 ]. For example, to predict traffic in smart cities, parking availability prediction, estimate the total usage of energy of the citizens for a particular period, make context-aware and timely decisions for the people, etc. are some tasks that can be solved using machine learning techniques according to the current needs of the people.

Traffic prediction and transportation: Transportation systems have become a crucial component of every country’s economic development. Nonetheless, several cities around the world are experiencing an excessive rise in traffic volume, resulting in serious issues such as delays, traffic congestion, higher fuel prices, increased CO \(_2\) pollution, accidents, emergencies, and a decline in modern society’s quality of life [ 40 ]. Thus, an intelligent transportation system through predicting future traffic is important, which is an indispensable part of a smart city. Accurate traffic prediction based on machine and deep learning modeling can help to minimize the issues [ 17 , 30 , 31 ]. For example, based on the travel history and trend of traveling through various routes, machine learning can assist transportation companies in predicting possible issues that may occur on specific routes and recommending their customers to take a different path. Ultimately, these learning-based data-driven models help improve traffic flow, increase the usage and efficiency of sustainable modes of transportation, and limit real-world disruption by modeling and visualizing future changes.

Healthcare and COVID-19 pandemic: Machine learning can help to solve diagnostic and prognostic problems in a variety of medical domains, such as disease prediction, medical knowledge extraction, detecting regularities in data, patient management, etc. [ 33 , 77 , 112 ]. Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus, according to the World Health Organization (WHO) [ 3 ]. Recently, the learning techniques have become popular in the battle against COVID-19 [ 61 , 63 ]. For the COVID-19 pandemic, the learning techniques are used to classify patients at high risk, their mortality rate, and other anomalies [ 61 ]. It can also be used to better understand the virus’s origin, COVID-19 outbreak prediction, as well as for disease diagnosis and treatment [ 14 , 50 ]. With the help of machine learning, researchers can forecast where and when, the COVID-19 is likely to spread, and notify those regions to match the required arrangements. Deep learning also provides exciting solutions to the problems of medical image processing and is seen as a crucial technique for potential applications, particularly for COVID-19 pandemic [ 10 , 78 , 111 ]. Overall, machine and deep learning techniques can help to fight the COVID-19 virus and the pandemic as well as intelligent clinical decisions making in the domain of healthcare.

E-commerce and product recommendations: Product recommendation is one of the most well known and widely used applications of machine learning, and it is one of the most prominent features of almost any e-commerce website today. Machine learning technology can assist businesses in analyzing their consumers’ purchasing histories and making customized product suggestions for their next purchase based on their behavior and preferences. E-commerce companies, for example, can easily position product suggestions and offers by analyzing browsing trends and click-through rates of specific items. Using predictive modeling based on machine learning techniques, many online retailers, such as Amazon [ 71 ], can better manage inventory, prevent out-of-stock situations, and optimize logistics and warehousing. The future of sales and marketing is the ability to capture, evaluate, and use consumer data to provide a customized shopping experience. Furthermore, machine learning techniques enable companies to create packages and content that are tailored to the needs of their customers, allowing them to maintain existing customers while attracting new ones.

NLP and sentiment analysis: Natural language processing (NLP) involves the reading and understanding of spoken or written language through the medium of a computer [ 79 , 103 ]. Thus, NLP helps computers, for instance, to read a text, hear speech, interpret it, analyze sentiment, and decide which aspects are significant, where machine learning techniques can be used. Virtual personal assistant, chatbot, speech recognition, document description, language or machine translation, etc. are some examples of NLP-related tasks. Sentiment Analysis [ 90 ] (also referred to as opinion mining or emotion AI) is an NLP sub-field that seeks to identify and extract public mood and views within a given text through blogs, reviews, social media, forums, news, etc. For instance, businesses and brands use sentiment analysis to understand the social sentiment of their brand, product, or service through social media platforms or the web as a whole. Overall, sentiment analysis is considered as a machine learning task that analyzes texts for polarity, such as “positive”, “negative”, or “neutral” along with more intense emotions like very happy, happy, sad, very sad, angry, have interest, or not interested etc.

Image, speech and pattern recognition: Image recognition [ 36 ] is a well-known and widespread example of machine learning in the real world, which can identify an object as a digital image. For instance, to label an x-ray as cancerous or not, character recognition, or face detection in an image, tagging suggestions on social media, e.g., Facebook, are common examples of image recognition. Speech recognition [ 23 ] is also very popular that typically uses sound and linguistic models, e.g., Google Assistant, Cortana, Siri, Alexa, etc. [ 67 ], where machine learning methods are used. Pattern recognition [ 13 ] is defined as the automated recognition of patterns and regularities in data, e.g., image analysis. Several machine learning techniques such as classification, feature selection, clustering, or sequence labeling methods are used in the area.

Sustainable agriculture: Agriculture is essential to the survival of all human activities [ 109 ]. Sustainable agriculture practices help to improve agricultural productivity while also reducing negative impacts on the environment [ 5 , 25 , 109 ]. The sustainable agriculture supply chains are knowledge-intensive and based on information, skills, technologies, etc., where knowledge transfer encourages farmers to enhance their decisions to adopt sustainable agriculture practices utilizing the increasing amount of data captured by emerging technologies, e.g., the Internet of Things (IoT), mobile technologies and devices, etc. [ 5 , 53 , 54 ]. Machine learning can be applied in various phases of sustainable agriculture, such as in the pre-production phase - for the prediction of crop yield, soil properties, irrigation requirements, etc.; in the production phase—for weather prediction, disease detection, weed detection, soil nutrient management, livestock management, etc.; in processing phase—for demand estimation, production planning, etc. and in the distribution phase - the inventory management, consumer analysis, etc.

User behavior analytics and context-aware smartphone applications: Context-awareness is a system’s ability to capture knowledge about its surroundings at any moment and modify behaviors accordingly [ 28 , 93 ]. Context-aware computing uses software and hardware to automatically collect and interpret data for direct responses. The mobile app development environment has been changed greatly with the power of AI, particularly, machine learning techniques through their learning capabilities from contextual data [ 103 , 136 ]. Thus, the developers of mobile apps can rely on machine learning to create smart apps that can understand human behavior, support, and entertain users [ 107 , 137 , 140 ]. To build various personalized data-driven context-aware systems, such as smart interruption management, smart mobile recommendation, context-aware smart searching, decision-making that intelligently assist end mobile phone users in a pervasive computing environment, machine learning techniques are applicable. For example, context-aware association rules can be used to build an intelligent phone call application [ 104 ]. Clustering approaches are useful in capturing users’ diverse behavioral activities by taking into account data in time series [ 102 ]. To predict the future events in various contexts, the classification methods can be used [ 106 , 139 ]. Thus, various learning techniques discussed in Sect. “ Machine Learning Tasks and Algorithms ” can help to build context-aware adaptive and smart applications according to the preferences of the mobile phone users.

In addition to these application areas, machine learning-based models can also apply to several other domains such as bioinformatics, cheminformatics, computer networks, DNA sequence classification, economics and banking, robotics, advanced engineering, and many more.

Challenges and Research Directions

Our study on machine learning algorithms for intelligent data analysis and applications opens several research issues in the area. Thus, in this section, we summarize and discuss the challenges faced and the potential research opportunities and future directions.

In general, the effectiveness and the efficiency of a machine learning-based solution depend on the nature and characteristics of the data, and the performance of the learning algorithms. To collect the data in the relevant domain, such as cybersecurity, IoT, healthcare and agriculture discussed in Sect. “ Applications of Machine Learning ” is not straightforward, although the current cyberspace enables the production of a huge amount of data with very high frequency. Thus, collecting useful data for the target machine learning-based applications, e.g., smart city applications, and their management is important to further analysis. Therefore, a more in-depth investigation of data collection methods is needed while working on the real-world data. Moreover, the historical data may contain many ambiguous values, missing values, outliers, and meaningless data. The machine learning algorithms, discussed in Sect “ Machine Learning Tasks and Algorithms ” highly impact on data quality, and availability for training, and consequently on the resultant model. Thus, to accurately clean and pre-process the diverse data collected from diverse sources is a challenging task. Therefore, effectively modifying or enhance existing pre-processing methods, or proposing new data preparation techniques are required to effectively use the learning algorithms in the associated application domain.

To analyze the data and extract insights, there exist many machine learning algorithms, summarized in Sect. “ Machine Learning Tasks and Algorithms ”. Thus, selecting a proper learning algorithm that is suitable for the target application is challenging. The reason is that the outcome of different learning algorithms may vary depending on the data characteristics [ 106 ]. Selecting a wrong learning algorithm would result in producing unexpected outcomes that may lead to loss of effort, as well as the model’s effectiveness and accuracy. In terms of model building, the techniques discussed in Sect. “ Machine Learning Tasks and Algorithms ” can directly be used to solve many real-world issues in diverse domains, such as cybersecurity, smart cities and healthcare summarized in Sect. “ Applications of Machine Learning ”. However, the hybrid learning model, e.g., the ensemble of methods, modifying or enhancement of the existing learning techniques, or designing new learning methods, could be a potential future work in the area.

Thus, the ultimate success of a machine learning-based solution and corresponding applications mainly depends on both the data and the learning algorithms. If the data are bad to learn, such as non-representative, poor-quality, irrelevant features, or insufficient quantity for training, then the machine learning models may become useless or will produce lower accuracy. Therefore, effectively processing the data and handling the diverse learning algorithms are important, for a machine learning-based solution and eventually building intelligent applications.

In this paper, we have conducted a comprehensive overview of machine learning algorithms for intelligent data analysis and applications. According to our goal, we have briefly discussed how various types of machine learning methods can be used for making solutions to various real-world issues. A successful machine learning model depends on both the data and the performance of the learning algorithms. The sophisticated learning algorithms then need to be trained through the collected real-world data and knowledge related to the target application before the system can assist with intelligent decision-making. We also discussed several popular application areas based on machine learning techniques to highlight their applicability in various real-world issues. Finally, we have summarized and discussed the challenges faced and the potential research opportunities and future directions in the area. Therefore, the challenges that are identified create promising research opportunities in the field which must be addressed with effective solutions in various application areas. Overall, we believe that our study on machine learning-based solutions opens up a promising direction and can be used as a reference guide for potential research and applications for both academia and industry professionals as well as for decision-makers, from a technical point of view.

Canadian institute of cybersecurity, university of new brunswick, iscx dataset, http://www.unb.ca/cic/datasets/index.html/ (Accessed on 20 October 2019).

Cic-ddos2019 [online]. available: https://www.unb.ca/cic/datasets/ddos-2019.html/ (Accessed on 28 March 2020).

World health organization: WHO. http://www.who.int/ .

Google trends. In https://trends.google.com/trends/ , 2019.

Adnan N, Nordin Shahrina Md, Rahman I, Noor A. The effects of knowledge transfer on farmers decision making toward sustainable agriculture practices. World J Sci Technol Sustain Dev. 2018.

Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD international conference on Management of data. 1998; 94–105

Agrawal R, Imieliński T, Swami A. Mining association rules between sets of items in large databases. In: ACM SIGMOD Record. ACM. 1993;22: 207–216

Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Fast algorithms for mining association rules. In: Proceedings of the International Joint Conference on Very Large Data Bases, Santiago Chile. 1994; 1215: 487–499.

Aha DW, Kibler D, Albert M. Instance-based learning algorithms. Mach Learn. 1991;6(1):37–66.

Article   Google Scholar  

Alakus TB, Turkoglu I. Comparison of deep learning approaches to predict covid-19 infection. Chaos Solit Fract. 2020;140:

Amit Y, Geman D. Shape quantization and recognition with randomized trees. Neural Comput. 1997;9(7):1545–88.

Ankerst M, Breunig MM, Kriegel H-P, Sander J. Optics: ordering points to identify the clustering structure. ACM Sigmod Record. 1999;28(2):49–60.

Anzai Y. Pattern recognition and machine learning. Elsevier; 2012.

MATH   Google Scholar  

Ardabili SF, Mosavi A, Ghamisi P, Ferdinand F, Varkonyi-Koczy AR, Reuter U, Rabczuk T, Atkinson PM. Covid-19 outbreak prediction with machine learning. Algorithms. 2020;13(10):249.

Article   MathSciNet   Google Scholar  

Baldi P. Autoencoders, unsupervised learning, and deep architectures. In: Proceedings of ICML workshop on unsupervised and transfer learning, 2012; 37–49 .

Balducci F, Impedovo D, Pirlo G. Machine learning applications on agricultural datasets for smart farm enhancement. Machines. 2018;6(3):38.

Boukerche A, Wang J. Machine learning-based traffic prediction models for intelligent transportation systems. Comput Netw. 2020;181

Breiman L. Bagging predictors. Mach Learn. 1996;24(2):123–40.

Article   MATH   Google Scholar  

Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.

Breiman L, Friedman J, Stone CJ, Olshen RA. Classification and regression trees. CRC Press; 1984.

Cao L. Data science: a comprehensive overview. ACM Comput Surv (CSUR). 2017;50(3):43.

Google Scholar  

Carpenter GA, Grossberg S. A massively parallel architecture for a self-organizing neural pattern recognition machine. Comput Vis Graph Image Process. 1987;37(1):54–115.

Chiu C-C, Sainath TN, Wu Y, Prabhavalkar R, Nguyen P, Chen Z, Kannan A, Weiss RJ, Rao K, Gonina E, et al. State-of-the-art speech recognition with sequence-to-sequence models. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018 pages 4774–4778. IEEE .

Chollet F. Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1251–1258, 2017.

Cobuloglu H, Büyüktahtakın IE. A stochastic multi-criteria decision analysis for sustainable biomass crop selection. Expert Syst Appl. 2015;42(15–16):6065–74.

Das A, Ng W-K, Woon Y-K. Rapid association rule mining. In: Proceedings of the tenth international conference on Information and knowledge management, pages 474–481. ACM, 2001.

de Amorim RC. Constrained clustering with minkowski weighted k-means. In: 2012 IEEE 13th International Symposium on Computational Intelligence and Informatics (CINTI), pages 13–17. IEEE, 2012.

Dey AK. Understanding and using context. Person Ubiquit Comput. 2001;5(1):4–7.

Eagle N, Pentland AS. Reality mining: sensing complex social systems. Person Ubiquit Comput. 2006;10(4):255–68.

Essien A, Petrounias I, Sampaio P, Sampaio S. Improving urban traffic speed prediction using data source fusion and deep learning. In: 2019 IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE. 2019: 1–8. .

Essien A, Petrounias I, Sampaio P, Sampaio S. A deep-learning model for urban traffic flow prediction with traffic events mined from twitter. In: World Wide Web, 2020: 1–24 .

Ester M, Kriegel H-P, Sander J, Xiaowei X, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd. 1996;96:226–31.

Fatima M, Pasha M, et al. Survey of machine learning algorithms for disease diagnostic. J Intell Learn Syst Appl. 2017;9(01):1.

Flach PA, Lachiche N. Confirmation-guided discovery of first-order rules with tertius. Mach Learn. 2001;42(1–2):61–95.

Freund Y, Schapire RE, et al. Experiments with a new boosting algorithm. In: Icml, Citeseer. 1996; 96: 148–156

Fujiyoshi H, Hirakawa T, Yamashita T. Deep learning-based image recognition for autonomous driving. IATSS Res. 2019;43(4):244–52.

Fukunaga K, Hostetler L. The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans Inform Theory. 1975;21(1):32–40.

Article   MathSciNet   MATH   Google Scholar  

Goodfellow I, Bengio Y, Courville A, Bengio Y. Deep learning. Cambridge: MIT Press; 2016.

Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Advances in neural information processing systems. 2014: 2672–2680.

Guerrero-Ibáñez J, Zeadally S, Contreras-Castillo J. Sensor technologies for intelligent transportation systems. Sensors. 2018;18(4):1212.

Han J, Pei J, Kamber M. Data mining: concepts and techniques. Amsterdam: Elsevier; 2011.

Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In: ACM Sigmod Record, ACM. 2000;29: 1–12.

Harmon SA, Sanford TH, Sheng X, Turkbey EB, Roth H, Ziyue X, Yang D, Myronenko A, Anderson V, Amalou A, et al. Artificial intelligence for the detection of covid-19 pneumonia on chest ct using multinational datasets. Nat Commun. 2020;11(1):1–7.

He K, Zhang X, Ren S, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell. 2015;37(9):1904–16.

He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 770–778.

Hinton GE. A practical guide to training restricted boltzmann machines. In: Neural networks: Tricks of the trade. Springer. 2012; 599-619

Holte RC. Very simple classification rules perform well on most commonly used datasets. Mach Learn. 1993;11(1):63–90.

Hotelling H. Analysis of a complex of statistical variables into principal components. J Edu Psychol. 1933;24(6):417.

Houtsma M, Swami A. Set-oriented mining for association rules in relational databases. In: Data Engineering, 1995. Proceedings of the Eleventh International Conference on, IEEE.1995:25–33.

Jamshidi M, Lalbakhsh A, Talla J, Peroutka Z, Hadjilooei F, Lalbakhsh P, Jamshidi M, La Spada L, Mirmozafari M, Dehghani M, et al. Artificial intelligence and covid-19: deep learning approaches for diagnosis and treatment. IEEE Access. 2020;8:109581–95.

John GH, Langley P. Estimating continuous distributions in bayesian classifiers. In: Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc. 1995; 338–345

Kaelbling LP, Littman ML, Moore AW. Reinforcement learning: a survey. J Artif Intell Res. 1996;4:237–85.

Kamble SS, Gunasekaran A, Gawankar SA. Sustainable industry 4.0 framework: a systematic literature review identifying the current trends and future perspectives. Process Saf Environ Protect. 2018;117:408–25.

Kamble SS, Gunasekaran A, Gawankar SA. Achieving sustainable performance in a data-driven agriculture supply chain: a review for research and applications. Int J Prod Econ. 2020;219:179–94.

Kaufman L, Rousseeuw PJ. Finding groups in data: an introduction to cluster analysis, vol. 344. John Wiley & Sons; 2009.

Keerthi SS, Shevade SK, Bhattacharyya C, Radha Krishna MK. Improvements to platt’s smo algorithm for svm classifier design. Neural Comput. 2001;13(3):637–49.

Khadse V, Mahalle PN, Biraris SV. An empirical comparison of supervised machine learning algorithms for internet of things data. In: 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), IEEE. 2018; 1–6

Kohonen T. The self-organizing map. Proc IEEE. 1990;78(9):1464–80.

Koroniotis N, Moustafa N, Sitnikova E, Turnbull B. Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: bot-iot dataset. Fut Gen Comput Syst. 2019;100:779–96.

Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, 2012: 1097–1105

Kushwaha S, Bahl S, Bagha AK, Parmar KS, Javaid M, Haleem A, Singh RP. Significant applications of machine learning for covid-19 pandemic. J Ind Integr Manag. 2020;5(4).

Lade P, Ghosh R, Srinivasan S. Manufacturing analytics and industrial internet of things. IEEE Intell Syst. 2017;32(3):74–9.

Lalmuanawma S, Hussain J, Chhakchhuak L. Applications of machine learning and artificial intelligence for covid-19 (sars-cov-2) pandemic: a review. Chaos Sol Fract. 2020:110059 .

LeCessie S, Van Houwelingen JC. Ridge estimators in logistic regression. J R Stat Soc Ser C (Appl Stat). 1992;41(1):191–201.

LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278–324.

Liu H, Motoda H. Feature extraction, construction and selection: A data mining perspective, vol. 453. Springer Science & Business Media; 1998.

López G, Quesada L, Guerrero LA. Alexa vs. siri vs. cortana vs. google assistant: a comparison of speech-based natural user interfaces. In: International Conference on Applied Human Factors and Ergonomics, Springer. 2017; 241–250.

Liu B, HsuW, Ma Y. Integrating classification and association rule mining. In: Proceedings of the fourth international conference on knowledge discovery and data mining, 1998.

MacQueen J, et al. Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1967;volume 1, pages 281–297. Oakland, CA, USA.

Mahdavinejad MS, Rezvan M, Barekatain M, Adibi P, Barnaghi P, Sheth AP. Machine learning for internet of things data analysis: a survey. Digit Commun Netw. 2018;4(3):161–75.

Marchand A, Marx P. Automated product recommendations with preference-based explanations. J Retail. 2020;96(3):328–43.

McCallum A. Information extraction: distilling structured data from unstructured text. Queue. 2005;3(9):48–57.

Mehrotra A, Hendley R, Musolesi M. Prefminer: mining user’s preferences for intelligent mobile notification management. In: Proceedings of the International Joint Conference on Pervasive and Ubiquitous Computing, Heidelberg, Germany, 12–16 September, 2016; pp. 1223–1234. ACM, New York, USA. .

Mohamadou Y, Halidou A, Kapen PT. A review of mathematical modeling, artificial intelligence and datasets used in the study, prediction and management of covid-19. Appl Intell. 2020;50(11):3913–25.

Mohammed M, Khan MB, Bashier Mohammed BE. Machine learning: algorithms and applications. CRC Press; 2016.

Book   Google Scholar  

Moustafa N, Slay J. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In: 2015 military communications and information systems conference (MilCIS), 2015;pages 1–6. IEEE .

Nilashi M, Ibrahim OB, Ahmadi H, Shahmoradi L. An analytical method for diseases prediction using machine learning techniques. Comput Chem Eng. 2017;106:212–23.

Yujin O, Park S, Ye JC. Deep learning covid-19 features on cxr using limited training data sets. IEEE Trans Med Imaging. 2020;39(8):2688–700.

Otter DW, Medina JR , Kalita JK. A survey of the usages of deep learning for natural language processing. IEEE Trans Neural Netw Learn Syst. 2020.

Park H-S, Jun C-H. A simple and fast algorithm for k-medoids clustering. Expert Syst Appl. 2009;36(2):3336–41.

Liii Pearson K. on lines and planes of closest fit to systems of points in space. Lond Edinb Dublin Philos Mag J Sci. 1901;2(11):559–72.

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.

MathSciNet   MATH   Google Scholar  

Perveen S, Shahbaz M, Keshavjee K, Guergachi A. Metabolic syndrome and development of diabetes mellitus: predictive modeling based on machine learning techniques. IEEE Access. 2018;7:1365–75.

Santi P, Ram D, Rob C, Nathan E. Behavior-based adaptive call predictor. ACM Trans Auton Adapt Syst. 2011;6(3):21:1–21:28.

Polydoros AS, Nalpantidis L. Survey of model-based reinforcement learning: applications on robotics. J Intell Robot Syst. 2017;86(2):153–73.

Puterman ML. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons; 2014.

Quinlan JR. Induction of decision trees. Mach Learn. 1986;1:81–106.

Quinlan JR. C4.5: programs for machine learning. Mach Learn. 1993.

Rasmussen C. The infinite gaussian mixture model. Adv Neural Inform Process Syst. 1999;12:554–60.

Ravi K, Ravi V. A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl Syst. 2015;89:14–46.

Rokach L. A survey of clustering algorithms. In: Data mining and knowledge discovery handbook, pages 269–298. Springer, 2010.

Safdar S, Zafar S, Zafar N, Khan NF. Machine learning based decision support systems (dss) for heart disease diagnosis: a review. Artif Intell Rev. 2018;50(4):597–623.

Sarker IH. Context-aware rule learning from smartphone data: survey, challenges and future directions. J Big Data. 2019;6(1):1–25.

Sarker IH. A machine learning based robust prediction model for real-life mobile phone data. Internet Things. 2019;5:180–93.

Sarker IH. Ai-driven cybersecurity: an overview, security intelligence modeling and research directions. SN Comput Sci. 2021.

Sarker IH. Deep cybersecurity: a comprehensive overview from neural network and deep learning perspective. SN Comput Sci. 2021.

Sarker IH, Abushark YB, Alsolami F, Khan A. Intrudtree: a machine learning based cyber security intrusion detection model. Symmetry. 2020;12(5):754.

Sarker IH, Abushark YB, Khan A. Contextpca: predicting context-aware smartphone apps usage based on machine learning techniques. Symmetry. 2020;12(4):499.

Sarker IH, Alqahtani H, Alsolami F, Khan A, Abushark YB, Siddiqui MK. Context pre-modeling: an empirical analysis for classification based user-centric context-aware predictive modeling. J Big Data. 2020;7(1):1–23.

Sarker IH, Alan C, Jun H, Khan AI, Abushark YB, Khaled S. Behavdt: a behavioral decision tree learning to build user-centric context-aware predictive model. Mob Netw Appl. 2019; 1–11.

Sarker IH, Colman A, Kabir MA, Han J. Phone call log as a context source to modeling individual user behavior. In: Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing (Ubicomp): Adjunct, Germany, pages 630–634. ACM, 2016.

Sarker IH, Colman A, Kabir MA, Han J. Individualized time-series segmentation for mining mobile phone user behavior. Comput J Oxf Univ UK. 2018;61(3):349–68.

Sarker IH, Hoque MM, MdK Uddin, Tawfeeq A. Mobile data science and intelligent apps: concepts, ai-based modeling and research directions. Mob Netw Appl, pages 1–19, 2020.

Sarker IH, Kayes ASM. Abc-ruleminer: user behavioral rule-based machine learning method for context-aware intelligent services. J Netw Comput Appl. 2020; page 102762

Sarker IH, Kayes ASM, Badsha S, Alqahtani H, Watters P, Ng A. Cybersecurity data science: an overview from machine learning perspective. J Big Data. 2020;7(1):1–29.

Sarker IH, Watters P, Kayes ASM. Effectiveness analysis of machine learning classification models for predicting personalized context-aware smartphone usage. J Big Data. 2019;6(1):1–28.

Sarker IH, Salah K. Appspred: predicting context-aware smartphone apps using random forest learning. Internet Things. 2019;8:

Scheffer T. Finding association rules that trade support optimally against confidence. Intell Data Anal. 2005;9(4):381–95.

Sharma R, Kamble SS, Gunasekaran A, Kumar V, Kumar A. A systematic literature review on machine learning applications for sustainable agriculture supply chain performance. Comput Oper Res. 2020;119:

Shengli S, Ling CX. Hybrid cost-sensitive decision tree, knowledge discovery in databases. In: PKDD 2005, Proceedings of 9th European Conference on Principles and Practice of Knowledge Discovery in Databases. Lecture Notes in Computer Science, volume 3721, 2005.

Shorten C, Khoshgoftaar TM, Furht B. Deep learning applications for covid-19. J Big Data. 2021;8(1):1–54.

Gökhan S, Nevin Y. Data analysis in health and big data: a machine learning medical diagnosis model based on patients’ complaints. Commun Stat Theory Methods. 2019;1–10

Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, et al. Mastering the game of go with deep neural networks and tree search. nature. 2016;529(7587):484–9.

Ślusarczyk B. Industry 4.0: Are we ready? Polish J Manag Stud. 17, 2018.

Sneath Peter HA. The application of computers to taxonomy. J Gen Microbiol. 1957;17(1).

Sorensen T. Method of establishing groups of equal amplitude in plant sociology based on similarity of species. Biol Skr. 1948; 5.

Srinivasan V, Moghaddam S, Mukherji A. Mobileminer: mining your frequent patterns on your phone. In: Proceedings of the International Joint Conference on Pervasive and Ubiquitous Computing, Seattle, WA, USA, 13-17 September, pp. 389–400. ACM, New York, USA. 2014.

Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015; pages 1–9.

Tavallaee M, Bagheri E, Lu W, Ghorbani AA. A detailed analysis of the kdd cup 99 data set. In. IEEE symposium on computational intelligence for security and defense applications. IEEE. 2009;2009:1–6.

Tsagkias M. Tracy HK, Surya K, Vanessa M, de Rijke M. Challenges and research opportunities in ecommerce search and recommendations. In: ACM SIGIR Forum. volume 54. NY, USA: ACM New York; 2021. p. 1–23.

Wagstaff K, Cardie C, Rogers S, Schrödl S, et al. Constrained k-means clustering with background knowledge. Icml. 2001;1:577–84.

Wang W, Yang J, Muntz R, et al. Sting: a statistical information grid approach to spatial data mining. VLDB. 1997;97:186–95.

Wei P, Li Y, Zhang Z, Tao H, Li Z, Liu D. An optimization method for intrusion detection classification model based on deep belief network. IEEE Access. 2019;7:87593–605.

Weiss K, Khoshgoftaar TM, Wang DD. A survey of transfer learning. J Big data. 2016;3(1):9.

Witten IH, Frank E. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann; 2005.

Witten IH, Frank E, Trigg LE, Hall MA, Holmes G, Cunningham SJ. Weka: practical machine learning tools and techniques with java implementations. 1999.

Wu C-C, Yen-Liang C, Yi-Hung L, Xiang-Yu Y. Decision tree induction with a constrained number of leaf nodes. Appl Intell. 2016;45(3):673–85.

Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Philip SY, et al. Top 10 algorithms in data mining. Knowl Inform Syst. 2008;14(1):1–37.

Xin Y, Kong L, Liu Z, Chen Y, Li Y, Zhu H, Gao M, Hou H, Wang C. Machine learning and deep learning methods for cybersecurity. IEEE Access. 2018;6:35365–81.

Xu D, Yingjie T. A comprehensive survey of clustering algorithms. Ann Data Sci. 2015;2(2):165–93.

Zaki MJ. Scalable algorithms for association mining. IEEE Trans Knowl Data Eng. 2000;12(3):372–90.

Zanella A, Bui N, Castellani A, Vangelista L, Zorzi M. Internet of things for smart cities. IEEE Internet Things J. 2014;1(1):22–32.

Zhao Q, Bhowmick SS. Association rule mining: a survey. Singapore: Nanyang Technological University; 2003.

Zheng T, Xie W, Xu L, He X, Zhang Y, You M, Yang G, Chen Y. A machine learning-based framework to identify type 2 diabetes through electronic health records. Int J Med Inform. 2017;97:120–7.

Zheng Y, Rajasegarar S, Leckie C. Parking availability prediction for sensor-enabled car parks in smart cities. In: Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), 2015 IEEE Tenth International Conference on. IEEE, 2015; pages 1–6.

Zhu H, Cao H, Chen E, Xiong H, Tian J. Exploiting enriched contextual information for mobile app classification. In: Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, 2012; pages 1617–1621

Zhu H, Chen E, Xiong H, Kuifei Y, Cao H, Tian J. Mining mobile user preferences for personalized context-aware recommendation. ACM Trans Intell Syst Technol (TIST). 2014;5(4):58.

Zikang H, Yong Y, Guofeng Y, Xinyu Z. Sentiment analysis of agricultural product ecommerce review data based on deep learning. In: 2020 International Conference on Internet of Things and Intelligent Applications (ITIA), IEEE, 2020; pages 1–7

Zulkernain S, Madiraju P, Ahamed SI. A context aware interruption management system for mobile devices. In: Mobile Wireless Middleware, Operating Systems, and Applications. Springer. 2010; pages 221–234

Zulkernain S, Madiraju P, Ahamed S, Stamm K. A mobile intelligent interruption management system. J UCS. 2010;16(15):2060–80.

Download references

Author information

Authors and affiliations.

Swinburne University of Technology, Melbourne, VIC, 3122, Australia

Iqbal H. Sarker

Department of Computer Science and Engineering, Chittagong University of Engineering & Technology, 4349, Chattogram, Bangladesh

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Iqbal H. Sarker .

Ethics declarations

Conflict of interest.

The author declares no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Advances in Computational Approaches for Artificial Intelligence, Image Processing, IoT and Cloud Applications” guest edited by Bhanu Prakash K N and M. Shivakumar.

Rights and permissions

Reprints and permissions

About this article

Sarker, I.H. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN COMPUT. SCI. 2 , 160 (2021). https://doi.org/10.1007/s42979-021-00592-x

Download citation

Received : 27 January 2021

Accepted : 12 March 2021

Published : 22 March 2021

DOI : https://doi.org/10.1007/s42979-021-00592-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Machine learning
  • Deep learning
  • Artificial intelligence
  • Data science
  • Data-driven decision-making
  • Predictive analytics
  • Intelligent applications
  • Find a journal
  • Publish with us
  • Track your research

A Systematic Literature Review of A* Pathfinding

  • January 2021
  • Procedia Computer Science 179(11):507-514
  • 179(11):507-514
  • CC BY-NC-ND 4.0
  • This person is not on ResearchGate, or hasn't claimed this research yet.

Abstract and Figures

Selection of Algorithms Comparative to A-Star

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations

Théo Dardier

  • Hartanto Wardana
  • Atyanta Nika
  • Wilhelmina Picanussa
  • King Harold Recto
  • Wenpeng Sang
  • Yaoshun Yue
  • Kaiwei Zhai
  • Hariz Kurniawan

Hefi Kristianto

  • Willy Azrieel
  • Viktor Handrianus Pranatawijaya
  • Zulhasni Abdul Rahim

Muhammad Saqib Iqbal

  • Guorong Chen
  • Xiaonan Hou
  • Mohammed Safir S. M

Ir. Dr. Norhaliza Abdul Wahab

  • Muhammad Zakiyullah Romdlony
  • Yaya Sudarya Triana
  • Gilang Tarwandi
  • Septian Dwi Cahyo
  • Vincent Capri Wijaya
  • Kota Shashidhar
  • Chittaranjan Hota
  • Proc Hum Factors Ergon Soc Annu Meet

Sameeran Kanade

  • Julio Cristian Young
  • Richard Luhulima
  • Kazaka Nathan
  • Lincy Joseph
  • Mokhtar M. Khorshid
  • Robert C. Holte
  • Nathan R. Sturtevant
  • Jordan Tyler Thayer

Wheeler Ruml

  • Ariel Felner

Jiaoyang Li

  • Sven Koenig
  • Daniel Harabor

Alban Grastien

  • Jung-Woon Ko
  • Dong-Yeop Lee
  • Hailong Huang
  • Yanqiang Xu

Muhtadin Muhtadin

  • Raden Marwan Zanuar
  • I Ketut Eddy Purnama

Mauridhi Hery Purnomo

  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

A free, AI-powered research tool for scientific literature

  • Juan Alonso
  • Parliamentary System

New & Improved API for Developers

Introducing semantic reader in beta.

Stay Connected With Semantic Scholar Sign Up What Is Semantic Scholar? Semantic Scholar is a free, AI-powered research tool for scientific literature, based at Ai2.

  • Practice Searching Algorithms
  • MCQs on Searching Algorithms
  • Tutorial on Searching Algorithms
  • Linear Search
  • Binary Search
  • Ternary Search
  • Jump Search
  • Sentinel Linear Search
  • Interpolation Search
  • Exponential Search
  • Fibonacci Search
  • Ubiquitous Binary Search
  • Linear Search Vs Binary Search
  • Interpolation Search Vs Binary Search
  • Binary Search Vs Ternary Search
  • Sentinel Linear Search Vs Linear Search
  • Searching Algorithms

Searching algorithms are essential tools in computer science used to locate specific items within a collection of data. These algorithms are designed to efficiently navigate through data structures to find the desired information, making them fundamental in various applications such as databases, web search engines , and more.

search algorithms research papers

Searching Algorithm

Table of Content

What is Searching?

  • Searching terminologies
  • Importance of Searching in DSA
  • Applications of Searching
  • Basics of Searching Algorithms
  • Comparisons Between Different Searching Algorithms
  • Library Implementations of Searching Algorithms
  • Easy Problems on Searching
  • Medium Problems on Searching
  • Hard Problems on Searching

Searching is the fundamental process of locating a specific element or item within a collection of data . This collection of data can take various forms, such as arrays, lists, trees, or other structured representations. The primary objective of searching is to determine whether the desired element exists within the data, and if so, to identify its precise location or retrieve it. It plays an important role in various computational tasks and real-world applications, including information retrieval, data analysis, decision-making processes, and more.

Searching terminologies:

Target element:.

In searching, there is always a specific target element or item that you want to find within the data collection. This target could be a value, a record, a key, or any other data entity of interest.

Search Space:

The search space refers to the entire collection of data within which you are looking for the target element. Depending on the data structure used, the search space may vary in size and organization.

Complexity:

Searching can have different levels of complexity depending on the data structure and the algorithm used. The complexity is often measured in terms of time and space requirements.

Deterministic vs. Non-deterministic:

Some searching algorithms, like  binary search , are deterministic, meaning they follow a clear, systematic approach. Others, such as linear search, are non-deterministic, as they may need to examine the entire search space in the worst case.

Importance of Searching in DSA:

  • Efficiency:  Efficient searching algorithms improve program performance.
  • Data Retrieval:  Quickly find and retrieve specific data from large datasets.
  • Database Systems:  Enables fast querying of databases.
  • Problem Solving:  Used in a wide range of problem-solving tasks.

Applications of Searching:

Searching algorithms have numerous applications across various fields. Here are some common applications:

  • Information Retrieval: Search engines like Google, Bing, and Yahoo use sophisticated searching algorithms to retrieve relevant information from vast amounts of data on the web.
  • Database Systems: Searching is fundamental in database systems for retrieving specific data records based on user queries, improving efficiency in data retrieval.
  • E-commerce: Searching is crucial in e-commerce platforms for users to find products quickly based on their preferences, specifications, or keywords.
  • Networking: In networking, searching algorithms are used for routing packets efficiently through networks, finding optimal paths, and managing network resources.
  • Artificial Intelligence: Searching algorithms play a vital role in AI applications, such as problem-solving, game playing (e.g., chess), and decision-making processes
  • Pattern Recognition: Searching algorithms are used in pattern matching tasks, such as image recognition, speech recognition, and handwriting recognition.

Basics of Searching Algorithms:

  • Introduction to Searching – Data Structure and Algorithm Tutorial
  • Importance of searching in Data Structure
  • What is the purpose of the search algorithm?

Searching Algorithms:

  • Meta Binary Search | One-Sided Binary Search
  • The Ubiquitous Binary Search

Comparisons Between Different Searching Algorithms:

  • Linear Search vs Binary Search
  • Interpolation search vs Binary search
  • Why is Binary Search preferred over Ternary Search?
  • Is Sentinel Linear Search better than normal Linear Search?

Library Implementations of Searching Algorithms:

  • Binary Search functions in C++ STL (binary_search, lower_bound and upper_bound)
  • Arrays.binarySearch() in Java with examples | Set 1
  • Arrays.binarySearch() in Java with examples | Set 2 (Search in subarray)
  • Collections.binarySearch() in Java with Examples

Easy Problems on Searching:

  • Find the largest three elements in an array
  • Find the Missing Number
  • Find the first repeating element in an array of integers
  • Find the missing and repeating number
  • Search, insert and delete in a sorted array
  • Count 1’s in a sorted binary array
  • Two elements whose sum is closest to zero
  • Find a pair with the given difference
  • k largest(or smallest) elements in an array
  • Kth smallest element in a row-wise and column-wise sorted 2D array
  • Find common elements in three sorted arrays
  • Ceiling in a sorted array
  • Floor in a Sorted Array
  • Find the maximum element in an array which is first increasing and then decreasing
  • Given an array of of size n and a number k, find all elements that appear more than n/k times

Medium Problems on Searching:

  • Find all triplets with zero sum
  • Find the element before which all the elements are smaller than it, and after which all are greater
  • Find the largest pair sum in an unsorted array
  • K’th Smallest/Largest Element in Unsorted Array
  • Search an element in a sorted and rotated array
  • Find the minimum element in a sorted and rotated array
  • Find a peak element
  • Maximum and minimum of an array using minimum number of comparisons
  • Find a Fixed Point in a given array
  • Find the k most frequent words from a file
  • Find k closest elements to a given value
  • Given a sorted array and a number x, find the pair in array whose sum is closest to x
  • Find the closest pair from two sorted arrays
  • Find three closest elements from given three sorted arrays
  • Binary Search for Rational Numbers without using floating point arithmetic

Hard Problems on Searching:

  • Median of two sorted arrays
  • Median of two sorted arrays of different sizes
  • Search in an almost sorted array
  • Find position of an element in a sorted array of infinite numbers
  • Given a sorted and rotated array, find if there is a pair with a given sum
  • K’th Smallest/Largest Element in Unsorted Array | Worst case Linear Time
  • K’th largest element in a stream
  • Best First Search (Informed Search)

Quick Links:

  • ‘Practice Problems’ on Searching
  • ‘Quizzes’ on Searching

Recommended:

  • Learn Data Structure and Algorithms | DSA Tutorial

Please Login to comment...

Similar reads.

  • Best Twitch Extensions for 2024: Top Tools for Viewers and Streamers
  • Discord Emojis List 2024: Copy and Paste
  • Best Adblockers for Twitch TV: Enjoy Ad-Free Streaming in 2024
  • PS4 vs. PS5: Which PlayStation Should You Buy in 2024?
  • 10 Best Free VPN Services in 2024

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

Algorithms & optimization

We perform fundamental research in algorithms, markets, optimization, and graph analysis, and use it to deliver solutions to challenges across Google’s business.

Google offices

About the team

Our team comprises multiple overlapping research groups working on graph mining, large-scale optimization, and market algorithms. We collaborate closely with teams across Google, benefiting Ads, Search, YouTube, Play, Infrastructure, Geo, Social, Image Search, Cloud and more. Along with these collaborations, we perform research related to algorithmic foundations of machine learning, distributed optimization, economics, data mining, and data-driven optimization. Our researchers are involved in both long-term research efforts as well as immediate applications of our technology.

Examples of recent research interests include online ad allocation problems, distributed algorithms for large-scale graph mining, mechanism design for advertising exchanges, and robust and dynamic pricing for ad auctions.

Team focus summaries

Large-scale optimization.

Our mission is to develop large-scale, distributed, and data-driven optimization techniques and use them to improve the efficiency and robustness of infrastructure and machine learning systems at Google. We achieve such goals as increasing throughput and decreasing latency in distributed systems, or improving feature selection and parameter tuning in machine learning. To do this, we apply techniques from areas such as combinatorial optimization, online algorithms, and control theory. Our research is used in critical infrastructure that supports products such as Search and Cloud.

Understanding places

Our mission is to discover all the world’s places and to understand people’s interactions with those places. We accomplish this by using ML to develop deep understanding of user trajectories and actions in the physical world, and we apply that understanding to solve the recurrent hard problems in geolocation data analysis. This research has enabled many of the novel features that appear in Google geo applications such as Maps.

Structured information extraction

Our mission is to extract salient information from templated documents and web pages and then use that information to assist users. We focus our efforts on extracting data such as flight information from email, event data form the web, and product information from the web. This enables many features in products such as Google Now, Search, and Shopping.

Search and information retrieval

Our mission is to conduct research to enable new or more effective search capabilities. This includes developing deeper understanding of correlations between documents and queries; modeling user attention and product satisfaction; developing Q&A models, particularly for the “next billion Internet users”; and, developing effective personal search models even when Google engineers cannot inspect private user input data.

Medical knowledge and learning

Our mission is offer a premier source of high-quality medical information along your entire online health journey. We provide relevant, targeted medical information to users by applying advanced ML on Google Search data. Examples of technologies created by this team include Symptom Search, Allergy Prediction, and other epidemiological applications.

Featured publications

Highlighted work.

algorithms

Some of our locations

Cambridge

Some of our people

Gagan Aggarwal

Gagan Aggarwal

  • Algorithms and Theory
  • Data Mining and Modeling
  • Economics and Electronic Commerce

David Applegate

David Applegate

Aaron Archer

Aaron Archer

  • Distributed Systems and Parallel Computing

Ashwinkumar Badanidiyuru Varadaraja

Ashwinkumar Badanidiyuru Varadaraja

  • Machine Intelligence

Mohammadhossein Bateni

Mohammadhossein Bateni

Michael Bendersky

Michael Bendersky

  • Information Retrieval and the Web

Kshipra Bhawalkar

Kshipra Bhawalkar

Edith Cohen

Edith Cohen

  • Machine Learning

Alessandro Epasto

Alessandro Epasto

Alejandra Estanislao

Alejandra Estanislao

Andrei Z. Broder

Andrei Z. Broder

Jon Feldman

Jon Feldman

Nadav Golbandi

Nadav Golbandi

Jeongwoo Ko

Jeongwoo Ko

  • Natural Language Processing

Marc Najork

Marc Najork

Nitish Korula

Nitish Korula

Kostas Kollias

Kostas Kollias

Silvio Lattanzi

Silvio Lattanzi

Cheng Li

Mohammad Mahdian

Alex Fabrikant

Alex Fabrikant

Rich Washington

Rich Washington

Qi Zhao

Andrew Tomkins

  • Human-Computer Interaction and Visualization

Vidhya Navalpakkam

Vidhya Navalpakkam

  • Machine Perception

Bhargav Kanagal

Bhargav Kanagal

Aranyak Mehta

Aranyak Mehta

Guillaume Chatelet

Guillaume Chatelet

  • Hardware and Architecture
  • Software Engineering
  • Software Systems

Sandeep Tata

Sandeep Tata

  • Data Management

Balasubramanian Sivan

Balasubramanian Sivan

Vahab S. Mirrokni

Vahab S. Mirrokni

Yuan Wang

Xuanhui Wang

Renato Paes Leme

Renato Paes Leme

Bryan Perozzi

Bryan Perozzi

Morteza Zadimoghaddam

Morteza Zadimoghaddam

Fabien Viger

Fabien Viger

Tamas Sarlos

Tamas Sarlos

James B. Wendt

James B. Wendt

We're always looking for more talented, passionate people.

Careers

IMAGES

  1. Search Algorithms in AI

    search algorithms research papers

  2. Search Algorithms in AI

    search algorithms research papers

  3. (PDF) A Proposed Improvement to Google Scholar Algorithms Through Broad

    search algorithms research papers

  4. Overview of the Search Algorithm

    search algorithms research papers

  5. SOLUTION: Ai types of search algorithms

    search algorithms research papers

  6. Search Algorithms

    search algorithms research papers

VIDEO

  1. Search Algorithm Example (BFS)

  2. Q2B24 Tokyo

  3. Artificial Intelligence-19: A* Search Algorithm Explained!

  4. Depth First Search

  5. 🔥Best websites to find search papers for free. How to find and download research papers? free

  6. A Career in Automated Algorithm Analysis

COMMENTS

  1. Comparative Analysis of Search Algorithms

    (PDF) Comparative Analysis of Search Algorithms

  2. A Brief Study and Analysis of Different Searching Algorithms

    In this paper, multiple solution vector based random search algorithm is presented. The conventional random search method has single solution vector and has difficulties associated with local ...

  3. A brief study and analysis of different searching algorithms

    This paper presents the review of certain important and well discussed traditional as well as proposed search algorithms with respect to their time complexity, space Complexity, merits and demerits with the help of their realize applications. This paper also highlights their working principles. As it is well known that every sorted or unsorted list of elements requires searching technique so ...

  4. A Systematic Literature Review of A* Pathfinding

    A Systematic Literature Review of A* Pathfinding

  5. Analysis of Searching Algorithms in Solving Modern Engineering Problems

    Many current engineering problems have been solved using artificial intelligence search algorithms. To conduct this research, we selected certain key algorithms that have served as the foundation for many other algorithms present today. This article exhibits and discusses the practical applications of A*, Breadth-First Search, Greedy, and Depth-First Search algorithms. We looked at several ...

  6. PDF The Anatomy of a Search Engine

    google.pdf - Stanford InfoLab

  7. A systematic approach to searching: an efficient and complete method to

    A systematic approach to searching: an efficient and ...

  8. Comparative Analysis of Search Algorithms in AI

    the most difficult prob lems in computer scienc e and. engineering. These tools are; Search and optimization, Logic, Probabilistic methods for uncertain reasoning, Classifiers and statistical ...

  9. Optimizing Search and Sort Algorithms: Harnessing ...

    This research study investigates the impact of parallel programming techniques on the performance of searching and sorting algorithms. Traditional sequential algorithms have been the foundation of data processing for decades, but the increasing availability of parallel computing resources opens new possibilities for improving efficiency and reducing execution times. By leveraging parallel ...

  10. Interpolated binary search: An efficient hybrid search algorithm on

    However, binary and interpolation algorithms are commonly used to search ordered datasets in many applications. In this paper, we propose a hybrid algorithm for searching ordered datasets based on the idea of interpolation and binary search. The proposed algorithm is called Interpolated Binary Search (IBS).

  11. SOAR: New algorithms for even faster vector search with ScaNN

    SOAR: New algorithms for even faster vector search with ...

  12. Harmony search algorithm and related variants: A systematic review

    Harmony memory size, H M S, is an important parameter of HS.In other words, it is an important indicator to measure the global search capability of HS. Normally, the larger the H M S, the stronger the ability to find the global optimal area.However, since HS starts from multiple points, as the H M S increases, the calculation amount of the algorithm will increase accordingly, which will affect ...

  13. A survey on sparrow search algorithms and their applications

    The sparrow search algorithm (SSA) is an efficient swarm-intelligence-based algorithm that has made some significant advances since its introduction in 2020. A detailed overview of the basic SSA and several SSA-based variants is presented in this paper.

  14. [1912.06059] Grid Search, Random Search, Genetic Algorithm: A Big

    Grid Search, Random Search, Genetic Algorithm: A Big ...

  15. Search Algorithm Research Papers

    View Search Algorithm Research Papers on Academia.edu for free. Skip to main content ... (guarantees logarithmic performance of searches), O ( n log n ). Search algorithms are given for partial match queries with t keys specified [proven maximum running time of O ( n ( k - t )/ k )] and for nearest neighbor queries [empirically observed average ...

  16. Search algorithm

    Search algorithm

  17. Quantifying computational advantage of Grover's algorithm with the

    Grover's algorithm and its cost. Grover's search algorithm 33 is one of the most important protocols of quantum computation 1,2.It searches an unstructured database of N elements for a target ...

  18. Machine Learning: Algorithms, Real-World Applications and Research

    Machine Learning: Algorithms, Real-World Applications ...

  19. A Systematic Literature Review of A* Pathfinding

    A* is a search algorithm that has long been used in the pathfinding research community. Its efficiency, simplicity, and modularity. are often highlighted as its strengths compared to other tools ...

  20. Algorithms and Theory

    Algorithms and Theory. Google's mission presents many exciting algorithmic and optimization challenges across different product areas including Search, Ads, Social, and Google Infrastructure. These include optimizing internal systems such as scheduling the machines that power the numerous computations done each day, as well as optimizations ...

  21. Semantic Scholar

    Semantic Scholar | AI-Powered Research Tool

  22. Searching Algorithms

    Searching Algorithms

  23. Algorithms & Optimization

    Algorithms & Optimization