Zobrazeno 1 - 10
of 311
pro vyhledávání: '"MEYER, FRANCOIS"'
Autor:
Meyer, François G
The notion of Fr\'echet mean (also known as "barycenter") network is the workhorse of most machine learning algorithms that require the estimation of a "location" parameter to analyse network-valued data. In this context, it is critical that the netw
Externí odkaz:
http://arxiv.org/abs/2408.03461
Autor:
Meyer, Francois, Buys, Jan
Multilingual modelling can improve machine translation for low-resource languages, partly through shared subword representations. This paper studies the role of subword segmentation in cross-lingual transfer. We systematically compare the efficacy of
Externí odkaz:
http://arxiv.org/abs/2403.20157
Autor:
Meyer, Francois, Buys, Jan
Most data-to-text datasets are for English, so the difficulties of modelling data-to-text for low-resource languages are largely unexplored. In this paper we tackle data-to-text for isiXhosa, which is low-resource and agglutinative. We introduce Trip
Externí odkaz:
http://arxiv.org/abs/2403.07567
Autor:
Meyer, Francois, Buys, Jan
Subword segmenters like BPE operate as a preprocessing step in neural machine translation and other (conditional) language models. They are applied to datasets before training, so translation or text generation quality relies on the quality of segmen
Externí odkaz:
http://arxiv.org/abs/2305.07005
The paper describes the University of Cape Town's submission to the constrained track of the WMT22 Shared Task: Large-Scale Machine Translation Evaluation for African Languages. Our system is a single multilingual translation model that translates be
Externí odkaz:
http://arxiv.org/abs/2210.11757
Autor:
Sanchez, Adam, Meyer, François G.
This work addresses the rising demand for novel tools in statistical and machine learning for "graph-valued random variables" by proposing a fast algorithm to compute the sample Frechet mean, which replaces the concept of sample mean for graphs (or n
Externí odkaz:
http://arxiv.org/abs/2210.07401
Autor:
Meyer, Francois, Buys, Jan
Subwords have become the standard units of text in NLP, enabling efficient open-vocabulary models. With algorithms like byte-pair encoding (BPE), subword segmentation is viewed as a preprocessing step applied to the corpus before training. This can l
Externí odkaz:
http://arxiv.org/abs/2210.06525
Autor:
Ferguson, Daniel, Meyer, François G.
For graph-valued data sampled iid from a distribution $\mu$, the sample moments are computed with respect to a choice of metric. In this work, we equip the set of graphs with the pseudo-metric defined by the $\ell_2$ norm between the eigenvalues of t
Externí odkaz:
http://arxiv.org/abs/2207.02168
Autor:
Meyer, Francois G.
We address the following foundational question: what is the population, and sample, Frechet mean (or median) graph of an ensemble of inhomogeneous Erdos-Renyi random graphs? We prove that if we use the Hamming distance to compute distances between gr
Externí odkaz:
http://arxiv.org/abs/2201.11954
Autor:
Ferguson, Daniel, Meyer, Francois G.
To characterize the location (mean, median) of a set of graphs, one needs a notion of centrality that is adapted to metric spaces, since graph sets are not Euclidean spaces. A standard approach is to consider the Frechet mean. In this work, we equip
Externí odkaz:
http://arxiv.org/abs/2201.05923