AI

Neural Architecture Search

How does neural architecture search build better neural networks than humans? See how NAS works, the top methods, real costs, and what comes next.
Neural architecture search workflow showing search space, search strategy, and evaluation in machine learning

Introduction

Neural architecture search has quietly reshaped how the strongest deep learning models get built today. For years, designing a neural network meant relying on human intuition, trial and error, and a lot of wasted compute. It replaces that manual guesswork with algorithms that explore thousands of candidate designs automatically. Research has compressed search times from over 2,000 GPU days down to roughly two or three GPU days for differentiable methods, according to a gradient-based NAS evaluation. Models discovered this way now power phone cameras, medical imaging, and large image classification systems. Knowing how the technology works shows why automated design keeps beating hand-tuned baselines built from how neural networks work. This guide walks through the methods, the trade-offs, the risks, and where the field heads next.

Quick Answers on NAS

What is neural architecture search?

Neural architecture search is an automated process that uses algorithms to design optimal neural network structures for a task, replacing manual, intuition-driven model design with systematic exploration.

How does neural architecture search work?

It defines a search space of possible architectures, applies a search strategy to explore that space, and uses an evaluation method to score each candidate on accuracy and efficiency.

Why does neural architecture search matter?

Neural architecture search matters because it finds models that often beat hand-designed networks on accuracy and speed, while cutting design time and powering efficient mobile vision systems.

Key Takeaways

  • NAS automates neural network design using a search space, a search strategy, and a performance evaluation method.
  • The main strategies are reinforcement learning, evolutionary algorithms, and differentiable methods like DARTS, which differ sharply in compute cost.
  • Famous models such as NASNet, MnasNet, and EfficientNet were discovered through automated search rather than hand design.
  • The biggest challenges are compute cost, carbon footprint, reproducibility, and bias baked into the search space.

What Is Neural Architecture Search in Machine Learning?

Neural architecture search is a machine learning method that automates neural network design. It combines a search space, a search strategy, and a performance evaluation method to discover architectures that maximize accuracy and efficiency for a specific task, with far less manual tuning.

An Interactive From AIplusInfo

Neural Architecture Search Cost Explorer

Pick a search strategy and scale to see how compute cost and accuracy trade off across the main neural architecture search methods.


100% of a standard search


Estimated search cost

2.5
GPU days

Approx. cloud cost

$75
at ~$30 per GPU day
Relative compute vs reinforcement learning0.1%
Typical ImageNet top-1 accuracy band~75%

Source: benchmark ranges from a gradient-based NAS evaluation and hardware-aware NAS research. Figures are illustrative estimates for comparison.

The Three Building Blocks of Every NAS System

Every NAS system rests on three components that work together as one loop. The first is the search space, which defines every possible architecture the algorithm is allowed to consider. The search space sets the boundaries of creativity, because no method can discover a design that the space does not permit. A well-designed space might include choices about layer types, connections, filter sizes, and depth. A space that is too narrow misses strong designs, while one that is too broad becomes impossibly expensive to explore. Engineers spend real effort shaping this space before any search begins, much as they would when learning how neural networks work.

The second component is the search strategy, which decides how the algorithm moves through that space of candidates. Some strategies sample architectures at random, while others learn from past attempts to make smarter choices. The strategy balances exploration of new regions against exploitation of designs that already look promising. A good strategy avoids wasting compute on obviously weak candidates early in the process. This is where reinforcement learning, evolution, and gradient methods diverge in important ways. The choice of strategy often determines whether a search finishes in hours or in weeks.

The third component is the performance estimation strategy, which scores how good each candidate architecture is. The naive approach trains every candidate fully, but that quickly becomes far too slow and expensive. Practical systems use proxies, such as training on smaller datasets or for fewer epochs, as the NAS overview on Wikipedia explains. Weight sharing and predictor models can estimate quality without full training runs. These shortcuts trade a little accuracy for enormous savings in time. Together, the three blocks form the engine that makes automated design feasible at scale.

Implementing NAS Step by Step

Building on those three blocks, a full search follows a repeatable cycle from start to finish. The process begins when engineers define the search space and the constraints they care about, such as accuracy targets or latency limits. A clear objective at the start keeps the search honest, because the algorithm optimizes exactly what you measure. Teams often choose tasks like image classification, where strong benchmarks make scoring straightforward. They also pick the hardware budget, since compute availability shapes which methods are realistic. This planning phase looks a lot like the early steps of any serious machine learning project. Skipping it usually leads to searches that wander without producing a useful model.

Once setup is done, the search strategy proposes a first batch of candidate architectures to evaluate. Each candidate is built, trained briefly or scored by a proxy, and then graded on the chosen metric. The strategy reads those grades and updates its internal model of what good designs look like. It then proposes a new batch that leans toward the patterns that scored well. This feedback loop repeats for many rounds, gradually concentrating effort on stronger regions of the space. Choosing the right language and tools matters here, which is why teams review the best programming languages for machine learning.

As the loop runs, the system tracks the best candidates found so far and watches for diminishing returns. When new rounds stop producing better scores, the search has likely converged on a strong design. Engineers then take the top architecture and train it fully on the complete dataset. This final training step produces the production-ready weights that the proxy scores only estimated. Careful validation guards against picking a model that overfit the proxy task rather than the real one. Cross-checking on held-out data, similar to using cross-validation to reduce overfitting, protects against that trap.

The last stage moves the discovered architecture from the lab into deployment. Engineers convert the model for its target platform, whether a cloud server, a phone, or an embedded chip. They measure real latency and memory use, not just the proxies used during search. Sometimes the model needs small manual adjustments to fit hardware constraints that the search missed. Monitoring continues after launch, since real-world data can drift away from the training distribution. This end-to-end flow shows that neural architecture search is a pipeline, not a single magic button. Each stage carries its own risks that careful teams plan for in advance.

Reinforcement Learning as a NAS Search Strategy

Turning to specific strategies, reinforcement learning was the approach that first put neural architecture search on the map. In this setup, a controller network proposes architectures and receives a reward based on how well each one performs. The controller learns over time to favor design choices that earned high rewards in earlier rounds. Google's early AutoML work used this method to design networks for image classification tasks. The controller is often trained with policy gradients, a technique drawn from reinforcement learning algorithms. The results were impressive and showed that machines could design competitive networks.

The catch with reinforcement learning was its staggering cost in compute and time. Early searches consumed around 2,000 GPU days to find a single strong architecture, as the gradient-based NAS survey documents. That price put the method out of reach for most teams outside large, well-funded labs. The controller also needed many full or partial training runs to gather useful reward signals. Researchers respected the quality of the results but pushed hard for cheaper alternatives. That pressure helped drive the field toward evolutionary and gradient-based methods.

Evolutionary Algorithms for Architecture Discovery

Shifting to a nature-inspired idea, evolutionary algorithms treat architecture design like biological selection. The method starts with a population of random architectures and scores each one on the target task. The strongest designs survive and reproduce, while weaker ones are discarded from the population. Reproduction happens through mutation, which tweaks a design, and sometimes crossover, which blends two parents. Over many generations, the population drifts toward architectures that perform better. This approach produced AmoebaNet, a model that matched or beat reinforcement-learning designs on image benchmarks.

Evolutionary search shares one major weakness with the reinforcement-learning approach. Both demand enormous compute, with some evolutionary runs reaching roughly 3,150 GPU days in published experiments. Each candidate in each generation still needs evaluation, which dominates the total cost. The method is also sensitive to choices like population size and mutation rate. Tuning those settings well is part science and part craft, much like managing overfitting and underfitting in standard training. Despite the cost, evolution remains attractive because it handles messy, non-differentiable search spaces with ease.

Evolutionary methods also offer a practical advantage in flexibility that gradient methods lack. They can optimize many objectives at once, such as accuracy, latency, and memory together. This makes them a natural fit for problems with hard hardware constraints. Researchers continue to refine them with smarter mutation rules and aging mechanisms. Some hybrid systems even pair evolution with cheaper performance predictors. The combination keeps evolutionary search relevant in a field that prizes efficiency.

Source: YouTube

Differentiable Search and the DARTS Breakthrough

Beyond those expensive methods, differentiable search changed the economics of the entire field. The key idea treats the discrete choice of operations as a continuous, learnable mixture. By relaxing the search space into a smooth function, the algorithm can optimize architecture choices with ordinary gradient descent. This approach, known as DARTS, collapsed search time from thousands of GPU days to just two or three. The method trains a single over-parameterized network where every candidate operation coexists with a learned weight. After training, the operations with the highest weights are kept and the rest are pruned, a process explained in this intuitive DARTS explanation.

The speed of DARTS made neural architecture search accessible to far more teams. Suddenly a search could run on a single workstation overnight rather than a data center for weeks. The gradient framing also connected NAS to familiar tools used for PyTorch loss functions and optimizers. Researchers quickly built many variants to improve stability and final accuracy. This wave of work turned differentiable search into a default starting point for new projects. The barrier to entry for serious architecture search dropped sharply almost overnight.

DARTS is powerful, but it carries its own well-documented problems. The continuous relaxation can collapse toward simple operations like skip connections that train easily but generalize poorly. Results can also vary widely between runs, which raises real reproducibility concerns. The memory cost of holding every candidate operation at once can strain hardware on large spaces. Researchers address these issues with techniques like partial channel connections and regularization. Even with the flaws, the gains in speed reshaped expectations for the whole field.

One-Shot Supernets and Weight Sharing

Building on the differentiable idea, one-shot methods take weight sharing to its logical extreme. They train a single large network, called a supernet, that contains every architecture in the search space as a subnetwork. Because all candidates share weights inside the supernet, evaluating a design becomes nearly free after one training run. To score a candidate, the system simply activates its corresponding path through the supernet and measures accuracy. This clever reuse avoids training thousands of separate candidate models from scratch. The savings in compute are large enough to make broad searches practical, a point echoed in the Roboflow guide to NAS.

Weight sharing introduces a subtle but genuinely important concern about measured accuracy. The shared weights are a compromise that may not reflect how a standalone model would truly perform. A candidate that scores well inside the supernet can sometimes disappoint after full, independent training. Researchers reduce this gap with careful sampling and fairer training schedules for the supernet. Techniques like batch normalization help stabilize the shared training, as covered in this guide to batch normalization for faster training. The one-shot idea remains a cornerstone of modern, efficient search.

Zero-Cost Proxies and Training-Free Scoring

Beyond one-shot training, zero-cost proxies try to score an architecture without training it at all. These methods analyze a network at initialization and compute a number that predicts its eventual accuracy. A good proxy can rank candidate architectures in seconds rather than the hours that training demands. The signals come from properties like gradient flow, the structure of the network, or how it responds to a single batch of data. When the proxy correlates strongly with real accuracy, it filters out weak designs almost instantly. This idea has become one of the most active research areas in efficient search, as a study on evolving zero-cost proxies shows.

The appeal of training-free scoring goes well beyond its obvious raw speed. It slashes the carbon footprint of search, since most of the energy in NAS goes into training candidates. Teams can screen huge numbers of architectures and reserve full training for only the top few. This pairs well with Bayesian methods that decide which candidates to inspect next, a topic covered in this guide to Bayesian optimization in machine learning. The combination can find strong designs with a fraction of the usual compute. That kind of efficiency is exactly what resource-limited teams need most.

Zero-cost proxies are promising tools, but they remain clearly imperfect today. No single proxy works perfectly across every task, dataset, or search space. A proxy that ranks vision models well may fail on language or tabular problems. Researchers now ensemble several proxies together to improve reliability and reduce blind spots. They also validate proxy rankings against full training on a small sample of candidates. Used carefully, these methods point toward a future where search costs keep falling.

Moving on to real deployment, hardware-aware search bakes device limits directly into the objective. Instead of optimizing accuracy alone, the search also rewards low latency, small memory use, and low power draw. A model that wins on accuracy but runs too slowly on a phone is useless for mobile products. Methods like MnasNet measure real latency on target devices during the search itself. This forces the algorithm to discover designs that are both accurate and genuinely fast in production. The result is models tuned for the exact hardware they will run on.

The payoff from this approach shows up clearly in published benchmarks. ProxylessNAS searched directly on target hardware and pruned costly computational paths during the process. Its GPU model reached 75.1 percent accuracy on ImageNet, about 3.1 percent better than MobileNetV2, at 5.1 milliseconds of latency. That kind of gain matters enormously for products that serve millions of requests. Hardware-aware search has become standard practice for any team shipping models to phones or edge chips. The technique connects model design tightly to the realities of computer vision fundamentals on constrained devices.

Measuring hardware cost during search raises its own engineering puzzles. Training latency tables for every device combination is slow and tedious work. Some methods use a single proxy device to estimate performance across a whole family of chips, as described in a paper on one proxy device for hardware-aware NAS. Others build small predictor models that estimate latency from a network's structure. These shortcuts keep hardware-aware search affordable without losing too much accuracy. The trade-off between precision and speed shows up again at this layer.

Hardware-software co-design takes the whole optimization idea one step further still. Here the search optimizes the network and the hardware configuration together as a single problem. This is valuable when teams control both the model and the accelerator it runs on. The approach can squeeze out gains that neither side could achieve alone. It is common in custom silicon projects at large technology companies. As specialized AI chips proliferate, this co-design mindset is likely to spread well beyond a few labs.

Where NAS Delivers Real Value

Looking at practical impact, NAS delivers the most value where efficiency is critical. Mobile and edge applications benefit hugely, since they run on tight power and memory budgets. Phone cameras, voice assistants, and on-device translation all rely on compact models that automated search helped design. These products need high accuracy without draining a battery or stalling the interface. Hand-designing such tightly constrained mobile models is slow and very error-prone. Automated search explores the trade-off space far more thoroughly than a human could.

Computer vision has been the flagship domain for NAS from the very beginning. Image classification, object detection, and segmentation all use architectures refined through search. Medical imaging is a fast-growing area, where specialized models can spot patterns in scans, building on ideas from recurrent neural networks and convolutional designs. Researchers have applied NAS to brain methylation prediction and other clinical tasks. The method tailors a model to the unusual structure of each dataset. That level of customization is hard to match with generic off-the-shelf architectures.

The reach of NAS now extends well beyond vision into many fields. It has been used for language models, speech recognition, and even scientific simulation. Generative models also benefit, including work related to generative adversarial networks. Any domain with a clear metric and enough data is a candidate for automated design. As proxies and supernets cut costs, the list of viable applications keeps growing. The technology is steadily becoming a standard part of the machine learning toolkit.

The Risks and Limitations of Automated Model Design

Despite the progress, NAS carries real risks that deserve attention. The most obvious is cost, since the strongest classical methods can demand thousands of GPU days. That price excludes smaller teams and concentrates cutting-edge research inside a handful of well-funded labs. Reproducibility is another serious concern, because many methods give different results across runs. Search-space design also bakes in human bias, which limits what the algorithm can ever discover. A poorly chosen space quietly caps the quality of every result it produces.

There is also a danger of optimizing for the wrong target. A search can overfit to a benchmark and produce models that fail on real, messier data. Proxy scores can mislead, rewarding designs that look good cheaply but generalize poorly. Teams that trust the search blindly may ship fragile models without realizing it. Strong validation on independent data is the main defense against this trap, much like guarding against overfitting and underfitting. Treating search as one tool among many keeps these risks manageable.

Beyond technical risks, neural architecture search raises pressing questions about energy and fairness. Training thousands of candidate models consumes large amounts of electricity and produces real carbon emissions. The environmental cost of brute-force search has become a genuine ethical concern within the research community. Estimates of the carbon footprint of heavy training runs have prompted calls for greener methods. Zero-cost proxies and one-shot models are partly a response to this pressure. Cutting compute is not just about money, it is also about sustainability.

The concentration of resources also creates a fairness problem in the field. Only organizations with massive compute can run the most ambitious searches. This widens the gap between large technology companies and everyone else. Reporting from the field notes that some firms have struggled to keep carbon offset promises while expanding AI data centers. The result is an uneven playing field where access to hardware shapes who can innovate. Sharing pretrained search results and open benchmarks helps level that ground somewhat.

Transparency is the third ethical pillar that is worth emphasizing clearly here. When a model is auto-designed, teams should still understand and document its behavior. Automated design can obscure why a network is structured the way it is. Clear reporting of search spaces, costs, and limitations builds trust with users and regulators. Responsible teams treat documentation as part of the deliverable, not an afterthought. Combining efficiency with honesty is the path toward sustainable, trustworthy automated design.

Looking ahead, the future of NAS points toward cheaper and smarter search. Training-free evaluation through zero-cost proxies is maturing fast and reshaping how candidates get screened. The clear trend is to spend less compute on each candidate while exploring far more of the space. Supernets, predictors, and proxies all push in the same direction of efficiency. This shift makes serious search realistic for teams that once could never afford it. The democratization of model design is a major theme for the years ahead.

Large language models are now entering the search loop in surprising ways. New LLM-guided systems use a language model to propose and refine architectures intelligently. One 2025 method reported search costs dropping from days to minutes while matching strong baselines, according to work on LLM-driven hardware-aware NAS. The language model reflects on past attempts and suggests targeted improvements for the next round. This pairs naturally with zero-cost proxies that score the proposals cheaply. The fusion of LLMs and search could redefine how the next generation of models is built.

Hardware-software co-design and broader automation round out the road ahead. As custom AI chips multiply, searching for models and hardware together will grow more common. The line between architecture search and full automated machine learning continues to blur. Future systems may handle data preparation, search, and deployment as one seamless pipeline, drawing on ideas from what deep learning is. The endgame is a workflow where strong models emerge with minimal human tuning. That vision is closer than it seemed just a few years ago.

Chart From AIplusInfo

The Collapsing Cost of Neural Architecture Search

Estimated search cost in GPU days across the main strategies (log-scaled bars).

Source: gradient-based NAS evaluation (MDPI) and hardware-aware NAS research.

Key Insights on NAS

  • Differentiable methods cut search time from over 2,000 GPU days of reinforcement learning to roughly two or three GPU days (MDPI).
  • Some evolutionary searches reached around 3,150 GPU days, underscoring how expensive classical strategies became before efficiency gains arrived (MDPI).
  • ProxylessNAS reached 75.1 percent ImageNet accuracy, about 3.1 percent above MobileNetV2, at 5.1 milliseconds latency on a V100 GPU (arXiv).
  • Zero-cost proxies score an architecture at initialization in seconds instead of training it for hours, slashing search energy use (arXiv).
  • A 2025 LLM-guided NAS method reported search costs dropping from days to minutes while matching strong supernet baselines (arXiv).
  • Weight sharing and differentiable search have slashed search times from months to mere hours for many tasks (Roboflow).
  • A major survey organizes the field across more than 1,000 papers, covering search spaces, algorithms, and benchmarks (arXiv).

These numbers tell a single, clear story about the direction of automated design. The field began with brute-force methods that were powerful but far too expensive for most teams. Each new technique, from differentiable search to zero-cost proxies, attacked the same bottleneck of evaluation cost. As the price of search fell, the range of viable applications widened from research labs to mobile products. The arrival of LLM-guided search now promises another large drop in cost and effort. Taken together, the trend points toward model design that is faster, cheaper, and far more accessible.

Comparing the Major NAS Strategies

Stepping back from the individual methods, a direct comparison makes the trade-offs much easier to see. The right NAS strategy depends on your compute budget, your accuracy targets, and the hardware you plan to ship on. Reinforcement learning and evolution offer flexibility but demand enormous compute that most teams cannot afford. Differentiable search trades some stability for a massive reduction in search time on a single machine. One-shot and zero-cost methods push efficiency the furthest, though they introduce a gap between proxy scores and real accuracy. The table below lines up these strategies across the dimensions that matter most when you choose one, much like weighing options in machine learning versus deep learning.

DimensionReinforcement LearningEvolutionary AlgorithmsDifferentiable (DARTS)One-Shot / Zero-Cost
Typical search cost~2,000 GPU days~3,150 GPU days2-3 GPU daysHours or less
Relative speedVery slowVery slowFastFastest
ReproducibilityModerateModerateOften unstableVaries by proxy
Multi-objective supportGoodExcellentLimitedGood with predictors
Memory demandLow per candidateLow per candidateHigh (all ops at once)High for supernet
Best use caseResearch with big budgetsHard hardware constraintsFast single-machine searchMassive candidate screening
Main weaknessExtreme compute costSensitive to settingsSkip-connect collapseProxy-to-real accuracy gap
MaturityEstablishedEstablishedWidely adoptedRapidly emerging

NAS in Practice: Models It Discovered

NASNet and the First Searched ImageNet Models

NASNet showed the world that a machine could design a competitive vision network from scratch. Researchers ran a reinforcement-learning controller that discovered reusable building blocks, then stacked those blocks to scale the model. The discovered cells were transferred from a small proxy dataset to full ImageNet classification, where they reached about 82.7 percent top-1 accuracy. The measurable outcome was a model that rivaled the best hand-designed networks of its era on standard benchmarks. The clear limitation was cost, since the search consumed enormous compute that few teams could ever replicate. The design principles behind it are documented in research on NASNet and EfficientNet design. NASNet proved the concept while exposing the efficiency problem the field would spend years solving.

MnasNet Tuned Directly for Mobile Phones

MnasNet was built to answer a practical question that pure accuracy searches ignored. The team ran a platform-aware search that measured real inference latency on mobile phones during the process itself. This implementation folded a speed objective directly into the reward, not just a classification score. The measurable outcome was roughly 75 percent top-1 ImageNet accuracy with far lower latency than earlier mobile networks. The limitation was that latency measured on one device did not always transfer cleanly to other chips. The platform-aware approach is described in the MnasNet paper. It became a template for how to search with deployment constraints in mind.

EfficientNet and Principled Model Scaling

EfficientNet combined a searched base network with a clean rule for scaling it up. Engineers used neural architecture search to find a strong baseline, then scaled depth, width, and resolution together with a compound coefficient. This implementation produced a whole family of models spanning small to large compute budgets. The measurable outcome reached about 84 percent top-1 accuracy with far fewer parameters than comparable hand-designed networks. The limitation was that the compound scaling rule worked best for the image domains it was tuned on. The working principles are covered in the same architecture design study. EfficientNet showed that search and smart scaling together beat either approach alone.

NAS in Production: Deployment Case Studies

Case Study: ProxylessNAS for GPU Inference

ProxylessNAS tackled the problem that proxy tasks hid the true cost of a model on real hardware. The team searched directly on the target hardware and pruned expensive computational paths as the search ran. This implementation avoided the usual gap between a proxy dataset and the deployment environment. The measurable outcome was a GPU model reaching 75.1 percent ImageNet accuracy, about 3.1 percent above MobileNetV2, at 5.1 milliseconds latency. The limitation was that searching directly on hardware tied the result tightly to that specific device. The method and its proxy-device extension appear in a paper on hardware-aware NAS with one proxy device. It demonstrated that searching on real hardware pays off in production speed.

Case Study: Hardware-Aware Search Across Device Types

A broad hardware-aware study tested whether one search method could serve many different chips. The researchers ran searches that explicitly modeled latency on several hardware platforms during evaluation. This implementation produced models tuned for each device rather than a single generic network. The measurable outcome delivered up to 3.47 percent higher accuracy and a 6.35 times speedup over prior NAS methods across three hardware types. The limitation was the engineering effort needed to build accurate latency models for every target platform. The findings are surveyed in an ACM Computing Surveys paper on NAS from a hardware perspective. It confirmed that hardware awareness generalizes across very different deployment targets.

Case Study: Deep Learning for Brain Methylation Prediction

Medical research offers a striking example of automated design applied to a sensitive clinical problem. A team deployed deep learning to improve prediction of brain methylation variants from complex biological data. This implementation tailored a model to the unusual structure of the genomic dataset rather than reusing a generic vision network. The measurable outcome was improved predictive accuracy by several percentage points over baseline approaches on the methylation task. The limitation was the need for careful validation, since clinical models must be trustworthy before any real use. The work is reported in a study on deep learning for brain methylation prediction. It shows how custom-designed models extend the reach of search into high-stakes domains.

Common Questions About NAS Explained

What is NAS in simple terms?

NAS is a way to let algorithms design neural networks instead of humans. It tries many candidate designs and keeps the ones that perform best. The goal is a model that is both accurate and efficient for a specific task.

How does NAS actually work?

It uses three parts that all work together in one loop. A search space lists possible designs, a search strategy picks which to try, and an evaluation method scores each one. The strategy learns from the scores and proposes better candidates over time.

What is the difference between NAS and AutoML?

AutoML automates the whole machine learning workflow, including data prep and tuning. NAS is one part of AutoML focused only on designing the network structure. NAS sits inside the broader AutoML toolkit as a specialized component.

Why is NAS so computationally expensive?

Each candidate design usually needs training to measure how good it is. Classical methods evaluate thousands of candidates, so the training cost adds up fast. Early searches consumed thousands of GPU days before efficient methods reduced the burden.

What is DARTS in NAS?

DARTS stands for differentiable architecture search, a popular gradient-based search method. It treats discrete design choices as a continuous mixture that gradient descent can optimize. This trick cut search time from thousands of GPU days to just two or three.

What are zero-cost proxies?

Zero-cost proxies estimate a network's likely quality without ever training it fully. They analyze the model at initialization and produce a score in seconds. When the score correlates with real accuracy, weak designs get filtered out almost instantly.

Which models were created using NAS?

NASNet, MnasNet, and EfficientNet were all discovered or refined through search. These vision models power many image classification and mobile applications. They often beat hand-designed networks on both accuracy and efficiency.

Is NAS only for computer vision?

No, vision was the first big domain but the field has expanded widely. NAS now applies to language, speech, generative models, and scientific tasks. Any problem with a clear metric and enough data is a candidate.

What is hardware-aware NAS?

Hardware-aware search adds device constraints like latency and memory to the objective. The algorithm rewards designs that run fast on the target chip, not just accurate ones. This produces models tuned for phones, edge devices, and specialized accelerators.

How long does a NAS take?

It depends quite heavily on the particular method that you choose. Classical reinforcement learning could take thousands of GPU days, while DARTS takes two or three. Zero-cost proxies and one-shot supernets can finish in hours or less.

What are the main risks of using NAS?

The big risks are high compute cost, carbon footprint, and poor reproducibility. Search-space bias can also quietly cap the quality of any result. Models can overfit a proxy task and fail on real, messier data.

Can small teams use NAS?

Yes, recent efficiency advances have made NAS far more accessible to everyone. Differentiable search runs on a single workstation overnight for many problems. Zero-cost proxies and shared benchmarks lower the barrier even further for small teams.

How will LLMs change NAS?

Large language models can now propose and refine candidate architectures quite intelligently. A 2025 method reported search costs falling from days to minutes using this approach. LLMs paired with zero-cost proxies could redefine how future models get designed.