Data Structures and Algorithms
See recent articles
Showing new listings for Thursday, 1 May 2025
- [1] arXiv:2504.21175 [pdf, html, other]
-
Title: Tight Bounds for Heavy-Hitters and Moment Estimation in the Sliding Window ModelSubjects: Data Structures and Algorithms (cs.DS)
We consider the heavy-hitters and $F_p$ moment estimation problems in the sliding window model. For $F_p$ moment estimation with $1<p\leq 2$, we show that it is possible to give a $(1\pm \epsilon)$ multiplicative approximation to the $F_p$ moment with $2/3$ probability on any given window of size $n$ using $\tilde{O}(\frac{1}{\epsilon^p}\log^2 n + \frac{1}{\epsilon^2}\log n)$ bits of space. We complement this result with a lower bound showing that our algorithm gives tight bounds up to factors of $\log\log n$ and $\log\frac{1}{\epsilon}.$ As a consequence of our $F_2$ moment estimation algorithm, we show that the heavy-hitters problem can be solved on an arbitrary window using $O(\frac{1}{\epsilon^2}\log^2 n)$ space which is tight.
- [2] arXiv:2504.21471 [pdf, html, other]
-
Title: Efficiently Finding All Minimal and Shortest Absent Subsequences in a StringSubjects: Data Structures and Algorithms (cs.DS); Formal Languages and Automata Theory (cs.FL)
Given a string $w$, another string $v$ is said to be a subsequence of $w$ if $v$ can be obtained from $w$ by removing some of its letters; on the other hand, $v$ is called an absent subsequence of $w$ if $v$ is not a subsequence of $w$. The existing literature on absent subsequences focused on understanding, for a string $w$, the set of its shortest absent subsequences (i.e., the shortest strings which are absent subsequences of $w$) and that of its minimal absent subsequences (i.e., those strings which are absent subsequences of $w$ but whose every proper subsequence occurs in $w$). Our contributions to this area of research are the following. Firstly, we present optimal algorithms (with linear time preprocessing and output-linear delay) for the enumeration of the shortest and, respectively, minimal absent subsequences. Secondly, we present optimal algorithms for the incremental enumeration of these strings with linear time preprocessing and constant delay; in this setting, we only output short edit-scripts showing how the currently enumerated string differs from the previous one. Finally, we provide an efficient algorithm for identifying a longest minimal absent subsequence of a string. All our algorithms improve the state-of-the-art results for the aforementioned problems.
- [3] arXiv:2504.21750 [pdf, html, other]
-
Title: Online Knapsack Problems with EstimatesComments: 20 pages, 2 figuresSubjects: Data Structures and Algorithms (cs.DS)
Imagine you are a computer scientist who enjoys attending conferences or workshops within the year. Sadly, your travel budget is limited, so you must select a subset of events you can travel to.
When you are aware of all possible events and their costs at the beginning of the year, you can select the subset of the possible events that maximizes your happiness and is within your budget.
On the other hand, if you are blind about the options, you will likely have a hard time when trying to decide if you want to register somewhere or not, and will likely regret decisions you made in the future.
These scenarios can be modeled by knapsack variants, either by an offline or an online problem. However, both scenarios are somewhat unrealistic:
Usually, you will not know the exact costs of each workshop at the beginning of the year. The online version, however, is too pessimistic, as you might already know which options there are and how much they cost roughly. At some point, you have to decide whether to register for some workshop, but then you are aware of the conference fee and the flight and hotel prices.
We model this problem within the setting of online knapsack problems with estimates: in the beginning, you receive a list of potential items with their estimated size as well as the accuracy of the estimates. Then, the items are revealed one by one in an online fashion with their actual size, and you need to decide whether to take one or not. In this article, we show a best-possible algorithm for each estimate accuracy $\delta$ (i.e., when each actual item size can deviate by $\pm \delta$ from the announced size) for both the simple knapsack and the simple knapsack with removability. - [4] arXiv:2504.21777 [pdf, html, other]
-
Title: Near-Optimal Distributed Ruling Sets for Trees and High-Girth GraphsSubjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
Given a graph $G=(V,E)$, a $\beta$-ruling set is a subset $S\subseteq V$ that is i) independent, and ii) every node $v\in V$ has a node of $S$ within distance $\beta$. In this paper we present almost optimal distributed algorithms for finding ruling sets in trees and high girth graphs in the classic LOCAL model. As our first contribution we present an $O(\log\log n)$-round randomized algorithm for computing $2$-ruling sets on trees, almost matching the $\Omega(\log\log n/\log\log\log n)$ lower bound given by Balliu et al. [FOCS'20]. Second, we show that $2$-ruling sets can be solved in $\widetilde{O}(\log^{5/3}\log n)$ rounds in high-girth graphs. Lastly, we show that $O(\log\log\log n)$-ruling sets can be computed in $\widetilde{O}(\log\log n)$ rounds in high-girth graphs matching the lower bound up to triple-log factors. All of these results either improve polynomially or exponentially on the previously best algorithms and use a smaller domination distance $\beta$.
New submissions (showing 4 of 4 entries)
- [5] arXiv:2504.21244 (cross-list from math.CO) [pdf, html, other]
-
Title: The Metric Dimension of Sparse Random GraphsComments: 23 pages, 0 figuresSubjects: Combinatorics (math.CO); Data Structures and Algorithms (cs.DS); Social and Information Networks (cs.SI); Probability (math.PR)
In 2013, Bollobás, Mitsche, and Pralat at gave upper and lower bounds for the likely metric dimension of random Erdős-Rényi graphs $G(n,p)$ for a large range of expected degrees $d=pn$. However, their results only apply when $d \ge \log^5 n$, leaving open sparser random graphs with $d < \log^5 n$. Here we provide upper and lower bounds on the likely metric dimension of $G(n,p)$ from just above the connectivity transition, i.e., where $d=pn=c \log n$ for some $c > 1$, up to $d=\log^5 n$. Our lower bound technique is based on an entropic argument which is more general than the use of Suen's inequality by Bollobás, Mitsche, and Pralat, whereas our upper bound is similar to theirs.
- [6] arXiv:2504.21564 (cross-list from quant-ph) [pdf, html, other]
-
Title: Simulating quantum collision models with Hamiltonian simulations using early fault-tolerant quantum computersComments: 22+7 pages, 5 Figures. Happy International Labour Day!Subjects: Quantum Physics (quant-ph); Data Structures and Algorithms (cs.DS)
We develop randomized quantum algorithms to simulate quantum collision models, also known as repeated interaction schemes, which provide a rich framework to model various open-system dynamics. The underlying technique involves composing time evolutions of the total (system, bath, and interaction) Hamiltonian and intermittent tracing out of the environment degrees of freedom. This results in a unified framework where any near-term Hamiltonian simulation algorithm can be incorporated to implement an arbitrary number of such collisions on early fault-tolerant quantum computers: we do not assume access to specialized oracles such as block encodings and minimize the number of ancilla qubits needed. In particular, using the correspondence between Lindbladian evolution and completely positive trace-preserving maps arising out of memoryless collisions, we provide an end-to-end quantum algorithm for simulating Lindbladian dynamics. For a system of $n$-qubits, we exhaustively compare the circuit depth needed to estimate the expectation value of an observable with respect to the reduced state of the system after time $t$ while employing different near-term Hamiltonian simulation techniques, requiring at most $n+2$ qubits in all. We compare the CNOT gate counts of the various approaches for estimating the Transverse Field Magnetization of a $10$-qubit XX-Heisenberg spin chain under amplitude damping. Finally, we also develop a framework to efficiently simulate an arbitrary number of memory-retaining collisions, i.e., where environments interact, leading to non-Markovian dynamics. Overall, our methods can leverage quantum collision models for both Markovian and non-Markovian dynamics on early fault-tolerant quantum computers, shedding light on the advantages and limitations of simulating open systems dynamics using this framework.
- [7] arXiv:2504.21601 (cross-list from math.GT) [pdf, html, other]
-
Title: Efficient Decomposition of Forman-Ricci Curvature on Vietoris-Rips Complexes and Data ApplicationsDanillo Barros de Souza, Jonatas Teodomiro, Fernando A. N. Santos, Mengjun Ding, Weiqiang Sun, Mathieu Desroches, Jürgen Nost, Serafim RodriguesSubjects: Geometric Topology (math.GT); Computational Geometry (cs.CG); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)
Discrete Forman-Ricci curvature (FRC) is an efficient tool that characterizes essential geometrical features and associated transitions of real-world networks, extending seamlessly to higher-dimensional computations in simplicial complexes. In this article, we provide two major advancements: First, we give a decomposition for FRC, enabling local computations of FRC. Second, we construct a set-theoretical proof enabling an efficient algorithm for the local computation of FRC in Vietoris-Rips (VR) this http URL, this approach reveals critical information and geometric insights often overlooked by conventional classification techniques. Our findings open new avenues for geometric computations in VR complexes and highlight an essential yet under-explored aspect of data classification: the geometry underpinning statistical patterns.
- [8] arXiv:2504.21781 (cross-list from cs.DC) [pdf, html, other]
-
Title: Message Optimality and Message-Time Trade-offs for APSP and BeyondComments: Accepted to PODC 2025, abstract shortened to fit arXiv constraintsSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
Round complexity is an extensively studied metric of distributed algorithms. In contrast, our knowledge of the \emph{message complexity} of distributed computing problems and its relationship (if any) with round complexity is still quite limited. To illustrate, for many fundamental distributed graph optimization problems such as (exact) diameter computation, All-Pairs Shortest Paths (APSP), Maximum Matching etc., while (near) round-optimal algorithms are known, message-optimal algorithms are hitherto unknown. More importantly, the existing round-optimal algorithms are not message-optimal. This raises two important questions: (1) Can we design message-optimal algorithms for these problems? (2) Can we give message-time tradeoffs for these problems in case the message-optimal algorithms are not round-optimal?
In this work, we focus on a fundamental graph optimization problem, \emph{All Pairs Shortest Path (APSP)}, whose message complexity is still unresolved. We present two main results in the CONGEST model: (1) We give a message-optimal (up to logarithmic factors) algorithm that solves weighted APSP, using $\tilde{O}(n^2)$ messages. This algorithm takes $\tilde{O}(n^2)$ rounds. (2) For any $0 \leq \varepsilon \le 1$, we show how to solve unweighted APSP in $\tilde{O}(n^{2-\varepsilon })$ rounds and $\tilde{O}(n^{2+\varepsilon })$ messages. At one end of this smooth trade-off, we obtain a (nearly) message-optimal algorithm using $\tilde{O}(n^2)$ messages (for $\varepsilon = 0$), whereas at the other end we get a (nearly) round-optimal algorithm using $\tilde{O}(n)$ rounds (for $\varepsilon = 1$). This is the first such message-time trade-off result known.
Cross submissions (showing 4 of 4 entries)
- [9] arXiv:2308.11401 (replaced) [pdf, html, other]
-
Title: Parameterized Complexity of Simultaneous PlanarityComments: Appears in the Proceedings of the 31st International Symposium on Graph Drawing and Network Visualization (GD 2023)Subjects: Data Structures and Algorithms (cs.DS)
Given $k$ input graphs $G_1, \dots ,G_k$, where each pair $G_i$, $G_j$ with $i \neq j$ shares the same graph $G$, the problem Simultaneous Embedding With Fixed Edges (SEFE) asks whether there exists a planar drawing for each input graph such that all drawings coincide on $G$. While SEFE is still open for the case of two input graphs, the problem is NP-complete for $k \geq 3$ [Schaefer, JGAA 13]. In this work, we explore the parameterized complexity of SEFE. We show that SEFE is FPT with respect to $k$ plus the vertex cover number or the feedback edge set number of the the union graph $G^\cup = G_1 \cup \dots \cup G_k$. Regarding the shared graph $G$, we show that SEFE is NP-complete, even if $G$ is a tree with maximum degree 4. Together with a known NP-hardness reduction [Angelini et al., TCS 15], this allows us to conclude that several parameters of $G$, including the maximum degree, the maximum number of degree-1 neighbors, the vertex cover number, and the number of cutvertices are intractable. We also settle the tractability of all pairs of these parameters. We give FPT algorithms for the vertex cover number plus either of the first two parameters and for the number of cutvertices plus the maximum degree, whereas we prove all remaining combinations to be intractable.
- [10] arXiv:2401.08019 (replaced) [pdf, html, other]
-
Title: Centrality of shortest paths: Algorithms and complexity resultsSubjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Optimization and Control (math.OC)
The degree centrality of a node, defined as the number of nodes adjacent to it, is often used as a measure of importance of a node to the structure of a network. This metric can be extended to paths in a network, where the degree centrality of a path is defined as the number of nodes adjacent to it. In this paper, we reconsider the problem of finding the most degree-central shortest path in an unweighted network. We propose a polynomial algorithm with the worst-case running time of $O(|E||V|^2\Delta(G))$, where $|V|$ is the number of vertices in the network, $|E|$ is the number of edges in the network, and $\Delta(G)$ is the maximum degree of the graph. We conduct a numerical study of our algorithm on synthetic and real-world networks and compare our results to the existing literature. In addition, we show that the same problem is NP-hard when a weighted graph is considered. Furthermore, we consider other centrality measures, such as the betweenness and closeness centrality, showing that the problem of finding the most betweenness-central shortest path is solvable in polynomial time and finding the most closeness-central shortest path is NP-hard, regardless of whether the graph is weighted or not.
- [11] arXiv:2502.15320 (replaced) [pdf, html, other]
-
Title: Adversarially-Robust Gossip Algorithms for Approximate Quantile and Mean ComputationsComments: 36 pagesSubjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
This paper presents gossip algorithms for aggregation tasks that demonstrate both robustness to adversarial corruptions of any order of magnitude and optimality across a substantial range of these corruption levels. Gossip algorithms distribute information in a scalable and efficient way by having random pairs of nodes exchange small messages. Value aggregation problems are of particular interest in this setting as they occur frequently in practice and many elegant algorithms have been proposed for computing aggregates and statistics such as averages and quantiles. An important and well-studied advantage of gossip algorithms is their robustness to message delays, network churn, and unreliable message transmissions. These crucial robustness guarantees however only hold if all nodes follow the protocol and no messages are corrupted. In this paper, we remedy this by providing a framework to model both adversarial participants and message corruptions in gossip-style communications by allowing an adversary to control a small fraction of the nodes or corrupt messages arbitrarily. Despite this very powerful and general corruption model, we show that one can design robust gossip algorithms for many important aggregation problems. Our algorithms guarantee that almost all nodes converge to an approximately correct answer with optimal efficiency and essentially as fast as without corruptions. The design of adversarially-robust gossip algorithms poses completely new challenges. Despite this, our algorithms remain very simple variations of known non-robust algorithms with often only subtle changes to avoid non-compliant nodes gaining too much influence over outcomes. While our algorithms remain simple, their analysis is much more complex and often requires a completely different approach than the non-adversarial setting.
- [12] arXiv:2503.17264 (replaced) [pdf, html, other]
-
Title: A 3.3904-Competitive Online Algorithm for List Update with Uniform CostsMateusz Basiak, Marcin Bienkowski, Martin Böhm, Marek Chrobak, Łukasz Jeż, Jiří Sgall, Agnieszka TatarczukSubjects: Data Structures and Algorithms (cs.DS)
We consider the List Update problem where the cost of each swap is assumed to be 1. This is in contrast to the ``standard'' model, in which an algorithm is allowed to swap the requested item with previous items for free. We construct an online algorithm Full-Or-Partial-Move (FPM), whose competitive ratio is at most $3.3904$, improving over the previous best known bound of $4$.
- [13] arXiv:2504.13489 (replaced) [pdf, html, other]
-
Title: New Results on a General Class of Minimum Norm Optimization ProblemsComments: The abstract is shortened due to the length limit of arXiv. This paper has been accepted by ICALP 2025Subjects: Data Structures and Algorithms (cs.DS)
We study the general norm optimization for combinatorial problems, initiated by Chakrabarty and Swamy (STOC 2019). We propose a general formulation that captures a large class of combinatorial structures: we are given a set $U$ of $n$ weighted elements and a family of feasible subsets $F$. Each subset $S\in F$ is called a feasible solution/set of the problem. We denote the value vector by $v=\{v_i\}_{i\in [n]}$, where $v_i\geq 0$ is the value of element $i$. For any subset $S\subseteq U$, we use $v[S]$ to denote the $n$-dimensional vector $\{v_e\cdot \mathbf{1}[e\in S]\}_{e\in U}$. Let $f: \mathbb{R}^n\rightarrow\mathbb{R}_+$ be a symmetric monotone norm function. Our goal is to minimize the norm objective $f(v[S])$ over feasible subset $S\in F$.
We present a general equivalent reduction of the norm minimization problem to a multi-criteria optimization problem with logarithmic budget constraints, up to a constant approximation factor. Leveraging this reduction, we obtain constant factor approximation algorithms for the norm minimization versions of several covering problems, such as interval cover, multi-dimensional knapsack cover, and logarithmic factor approximation for set cover. We also study the norm minimization versions for perfect matching, $s$-$t$ path and $s$-$t$ cut. We show the natural linear programming relaxations for these problems have a large integrality gap. To complement the negative result, we show that, for perfect matching, there is a bi-criteria result: for any constant $\epsilon,\delta>0$, we can find in polynomial time a nearly perfect matching (i.e., a matching that matches at least $1-\epsilon$ proportion of vertices) and its cost is at most $(8+\delta)$ times of the optimum for perfect matching. Moreover, we establish the existence of a polynomial-time $O(\log\log n)$-approximation algorithm for the norm minimization variant of the $s$-$t$ path problem. - [14] arXiv:2504.19482 (replaced) [pdf, html, other]
-
Title: Dynamic r-index: An Updatable Self-Index for Highly Repetitive StringsComments: We corrected the errors in the abstractSubjects: Data Structures and Algorithms (cs.DS)
A self-index is a compressed data structure that supports locate queries-reporting all positions where a given pattern occurs in a string. While many self-indexes have been proposed, developing dynamically updatable ones supporting string insertions and deletions remains a challenge. The r-index (Gagie et al., SODA'18) is a representative static self-index based on the run-length Burrows-Wheeler transform (RLBWT), designed for highly repetitive strings - those with many repeated substrings. We present the dynamic r-index, an extension of the r-index that supports locate queries in $\mathcal{O}((m + \mathsf{occ}) \log n)$ time using $\mathcal{O}(r)$ words, where $n$ is the length of the string $T$, $m$ is the pattern length, $\mathsf{occ}$ is the number of occurrences, and $r$ is the number of runs in the RLBWT of $T$. It supports string insertions and deletions in $\mathcal{O}((m + L_{\mathsf{max}}) \log n)$ time, where $L_{\max}$ is the maximum value in the LCP array of $T$. The average running time is $\mathcal{O}((m + L_{\mathsf{avg}}) \log n)$, where $L_{\mathsf{avg}}$ is the average LCP value. We experimentally evaluated the dynamic r-index on various highly repetitive strings and demonstrated its practicality.
- [15] arXiv:2504.19842 (replaced) [pdf, html, other]
-
Title: Near-Optimal Minimum Cuts in Hypergraphs at ScaleSubjects: Data Structures and Algorithms (cs.DS)
The hypergraph minimum cut problem aims to partition its vertices into two blocks while minimizing the total weight of the cut hyperedges. This fundamental problem arises in network reliability, VLSI design, and community detection. We present HeiCut, a scalable algorithm for computing near-optimal minimum cuts in both unweighted and weighted hypergraphs. HeiCut aggressively reduces the hypergraph size through a sequence of provably exact reductions that preserve the minimum cut, along with an optional heuristic contraction based on label propagation. It then solves a relaxed Binary Integer Linear Program (BIP) on the reduced hypergraph to compute a near-optimal minimum cut. Our extensive evaluation on over 500 real-world hypergraphs shows that HeiCut computes the exact minimum cut in over 85% of instances using our exact reductions alone, and offers the best solution quality across all instances. It solves over twice as many instances as the state-of-the-art within set computational limits, and is up to five orders of magnitude faster.
- [16] arXiv:2401.11991 (replaced) [pdf, html, other]
-
Title: Tight Bounds on the Message Complexity of Distributed Tree VerificationComments: To appear in Distributed Computing, SpringerSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
We consider the message complexity of verifying whether a given subgraph of the communication network forms a tree with specific properties both in the KT-$\rho$ (nodes know their $\rho$-hop neighborhood, including node IDs) and the KT-$0$ (nodes do not have this knowledge) models. We develop a rather general framework that helps in establishing tight lower bounds for various tree verification problems. We also consider two different verification requirements: namely that every node detects in the case the input is incorrect, as well as the requirement that at least one node detects. The results are stronger than previous ones in the sense that we assume that each node knows the number $n$ of nodes in the graph (in some cases) or an $\alpha$ approximation of $n$ (in other cases). For spanning tree verification, we show that the message complexity inherently depends on the quality of the given approximation of $n$: We show a tight lower bound of $\Omega(n^2)$ for the case $\alpha \ge \sqrt{2}$ and a much better upper bound (i.e., $O(n \log n)$) when nodes are given a tighter approximation. On the other hand, our framework also yields an $\Omega(n^2)$ lower bound on the message complexity of verifying a minimum spanning tree (MST), which reveals a polynomial separation between ST verification and MST verification. This result holds for randomized algorithms with perfect knowledge of the network size, and even when just one node detects illegal inputs, thus improving over the work of Kor, Korman, and Peleg (2013). For verifying a $d$-approximate BFS tree, we show that the same lower bound holds even if nodes know $n$ exactly, however, the lower bound is sensitive to $d$, which is the stretch parameter.
- [17] arXiv:2401.13185 (replaced) [pdf, other]
-
Title: Fast Partition-Based Cross-Validation With Centering and Scaling for $\mathbf{X}^\mathbf{T}\mathbf{X}$ and $\mathbf{X}^\mathbf{T}\mathbf{Y}$Comments: This version matches the published article in Journal of Chemometrics, DOI: this https URLJournal-ref: Journal of Chemometrics. Volume 39, Issue 3, 2025, e70008Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)
We present algorithms that substantially accelerate partition-based cross-validation for machine learning models that require matrix products $\mathbf{X}^\mathbf{T}\mathbf{X}$ and $\mathbf{X}^\mathbf{T}\mathbf{Y}$. Our algorithms have applications in model selection for, for example, principal component analysis (PCA), principal component regression (PCR), ridge regression (RR), ordinary least squares (OLS), and partial least squares (PLS). Our algorithms support all combinations of column-wise centering and scaling of $\mathbf{X}$ and $\mathbf{Y}$, and we demonstrate in our accompanying implementation that this adds only a manageable, practical constant over efficient variants without preprocessing. We prove the correctness of our algorithms under a fold-based partitioning scheme and show that the running time is independent of the number of folds; that is, they have the same time complexity as that of computing $\mathbf{X}^\mathbf{T}\mathbf{X}$ and $\mathbf{X}^\mathbf{T}\mathbf{Y}$ and space complexity equivalent to storing $\mathbf{X}$, $\mathbf{Y}$, $\mathbf{X}^\mathbf{T}\mathbf{X}$, and $\mathbf{X}^\mathbf{T}\mathbf{Y}$. Importantly, unlike alternatives found in the literature, we avoid data leakage due to preprocessing. We achieve these results by eliminating redundant computations in the overlap between training partitions. Concretely, we show how to manipulate $\mathbf{X}^\mathbf{T}\mathbf{X}$ and $\mathbf{X}^\mathbf{T}\mathbf{Y}$ using only samples from the validation partition to obtain the preprocessed training partition-wise $\mathbf{X}^\mathbf{T}\mathbf{X}$ and $\mathbf{X}^\mathbf{T}\mathbf{Y}$. To our knowledge, we are the first to derive correct and efficient cross-validation algorithms for any of the $16$ combinations of column-wise centering and scaling, for which we also prove only $12$ give distinct matrix products.
- [18] arXiv:2405.19037 (replaced) [pdf, html, other]
-
Title: Formalizing the notions of non-interactive and interactive algorithmsComments: 29 pages, major revision of v2 with more attention to the similarities and differences between non-interactive algorithms and interactive algorithmsSubjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Logic in Computer Science (cs.LO)
An earlier paper gives an account of a quest for a satisfactory formalization of the classical informal notion of an algorithm. That notion only covers algorithms that are deterministic and non-interactive. In this paper, an attempt is made to generalize the results of that quest first to a notion of an algorithm that covers both deterministic and non-deterministic algorithms that are non-interactive and then further to a notion of an algorithm that covers both deterministic and non-deterministic algorithms that are interactive. The notions of an non-interactive proto-algorithm and an interactive proto-algorithm are introduced. Non-interactive algorithms and interactive algorithms are expected to be equivalence classes of non-interactive proto-algorithms and interactive proto-algorithms, respectively, under an appropriate equivalence relation. On both non-interactive proto-algorithms and interactive proto-algorithms, three equivalence relations are defined. Two of them are deemed to be bounds for an appropriate equivalence relation and the third is likely an appropriate one.
- [19] arXiv:2410.05807 (replaced) [pdf, html, other]
-
Title: Extended convexity and smoothness and their applications in deep learningSubjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC)
Classical assumptions like strong convexity and Lipschitz smoothness often fail to capture the nature of deep learning optimization problems, which are typically non-convex and non-smooth, making traditional analyses less applicable. This study aims to elucidate the mechanisms of non-convex optimization in deep learning by extending the conventional notions of strong convexity and Lipschitz smoothness. By leveraging these concepts, we prove that, under the established constraints, the empirical risk minimization problem is equivalent to optimizing the local gradient norm and structural error, which together constitute the upper and lower bounds of the empirical risk. Furthermore, our analysis demonstrates that the stochastic gradient descent (SGD) algorithm can effectively minimize the local gradient norm. Additionally, techniques like skip connections, over-parameterization, and random parameter initialization are shown to help control the structural error. Ultimately, we validate the core conclusions of this paper through extensive experiments. Theoretical analysis and experimental results indicate that our findings provide new insights into the mechanisms of non-convex optimization in deep learning.