Journal of Machine Learning Research Papers: Volume 13の論文一覧

Journal of Machine Learning Research Papers Volume 13に記載されている内容を一覧にまとめ、機械翻訳を交えて日本語化し掲載します。
Exploration in Relational Domains for Model-based Reinforcement Learning
モデルベース強化学習のためのリレーショナルドメインの探索

A fundamental problem in reinforcement learning is balancing exploration and exploitation. We address this problem in the context of model-based reinforcement learning in large stochastic relational domains by developing relational extensions of the concepts of the E3 and R-MAX algorithms. Efficient exploration in exponentially large state spaces needs to exploit the generalization of the learned model: what in a propositional setting would be considered a novel situation and worth exploration may in the relational setting be a well-known context in which exploitation is promising. To address this we introduce relational count functions which generalize the classical notion of state and action visitation counts. We provide guarantees on the exploration efficiency of our framework using count functions under the assumption that we had a relational KWIK learner and a near-optimal planner. We propose a concrete exploration algorithm which integrates a practically efficient probabilistic rule learner and a relational planner (for which there are no guarantees, however) and employs the contexts of learned relational rules as features to model the novelty of states and actions. Our results in noisy 3D simulated robot manipulation problems and in domains of the international planning competition demonstrate that our approach is more effective than existing propositional and factored exploration techniques.

強化学習における基本的な問題は、探索と活用のバランスをとることです。私たちは、E3アルゴリズムとR-MAXアルゴリズムの概念のリレーショナル拡張を開発することで、大規模な確率的関係ドメインにおけるモデルベースの強化学習のコンテキストでこの問題に対処します。指数関数的に大きな状態空間での効率的な探索には、学習したモデルの一般化を活用する必要があります。命題設定では新しい状況と見なされ探索する価値があるものが、リレーショナル設定では活用が期待できるよく知られたコンテキストである可能性があります。これに対処するために、状態とアクションの訪問回数の古典的な概念を一般化するリレーショナルカウント関数を導入します。私たちは、リレーショナルKWIK学習器とほぼ最適なプランナーがあるという仮定の下で、カウント関数を使用してフレームワークの探索効率を保証します。私たちは、実際に効率的な確率的ルール学習器とリレーショナルプランナー(ただし、保証はありません)を統合し、学習したリレーショナルルールのコンテキストを特徴として使用して状態とアクションの新規性をモデル化する具体的な探索アルゴリズムを提案します。ノイズの多い3Dシミュレーションロボット操作問題と国際計画コンペティションの領域における私たちの結果は、私たちのアプローチが既存の命題的および因子別探索手法よりも効果的であることを示しています。

Security Analysis of Online Centroid Anomaly Detection
オンライン重心異常検出のセキュリティ分析

Security issues are crucial in a number of machine learning applications, especially in scenarios dealing with human activity rather than natural phenomena (e.g., information ranking, spam detection, malware detection, etc.). In such cases, learning algorithms may have to cope with manipulated data aimed at hampering decision making. Although some previous work addressed the issue of handling malicious data in the context of supervised learning, very little is known about the behavior of anomaly detection methods in such scenarios. In this contribution, we analyze the performance of a particular method—online centroid anomaly detection—in the presence of adversarial noise. Our analysis addresses the following security-related issues: formalization of learning and attack processes, derivation of an optimal attack, and analysis of attack efficiency and limitations. We derive bounds on the effectiveness of a poisoning attack against centroid anomaly detection under different conditions: attacker’s full or limited control over the traffic and bounded false positive rate. Our bounds show that whereas a poisoning attack can be effectively staged in the unconstrained case, it can be made arbitrarily difficult (a strict upper bound on the attacker’s gain) if external constraints are properly used. Our experimental evaluation, carried out on real traces of HTTP and exploit traffic, confirms the tightness of our theoretical bounds and the practicality of our protection mechanisms.

セキュリティの問題は、多くの機械学習アプリケーション、特に自然現象ではなく人間の活動を扱うシナリオ(情報ランキング、スパム検出、マルウェア検出など)で重要です。このような場合、学習アルゴリズムは、意思決定を妨げることを目的とした操作されたデータに対処しなければならない場合があります。以前の研究では、教師あり学習のコンテキストで悪意のあるデータを処理する問題に対処しましたが、このようなシナリオでの異常検出方法の動作についてはほとんどわかっていません。この投稿では、敵対的ノイズの存在下での特定の方法(オンラインセントロイド異常検出)のパフォーマンスを分析します。分析では、学習および攻撃プロセスの形式化、最適な攻撃の導出、攻撃の効率と制限の分析というセキュリティ関連の問題を取り上げます。さまざまな条件下で、つまり攻撃者がトラフィックを完全にまたは限定的に制御し、誤検出率が制限されている状況で、セントロイド異常検出に対するポイズニング攻撃の有効性の限界を導きます。我々の限界は、制約がない場合にはポイズニング攻撃を効果的に実行できるが、外部制約を適切に使用すれば攻撃を任意に困難にできる（攻撃者の利益に厳格な上限を設ける）ことを示しています。HTTPおよびエクスプロイトトラフィックの実際のトレースで実行された実験的評価により、我々の理論的な限界の厳密さと保護メカニズムの実用性が裏付けられました。

Smoothing Multivariate Performance Measures
多変量パフォーマンス指標の平滑化

Optimizing multivariate performance measure is an important task in Machine Learning. Joachims (2005) introduced a Support Vector Method whose underlying optimization problem is commonly solved by cutting plane methods (CPMs) such as SVM-Perf and BMRM. It can be shown that CPMs converge to an ε accurate solution in O(1/λ ε) iterations, where λ is the trade-off parameter between the regularizer and the loss function. Motivated by the impressive convergence rate of CPM on a number of practical problems, it was conjectured that these rates can be further improved. We disprove this conjecture in this paper by constructing counter examples. However, surprisingly, we further discover that these problems are not inherently hard, and we develop a novel smoothing strategy, which in conjunction with Nesterov’s accelerated gradient method, can find an ε accurate solution in O* (min {1/ε, 1/√λε}) iterations. Computationally, our smoothing technique is also particularly advantageous for optimizing multivariate performance scores such as precision/recall break-even point and ROCArea; the cost per iteration remains the same as that of CPMs. Empirical evaluation on some of the largest publicly available data sets shows that our method converges significantly faster than CPMs without sacrificing generalization ability.

多変量パフォーマンス測定の最適化は、機械学習における重要なタスクです。Joachims (2005)は、SVM-PerfやBMRMなどのカッティングプレーン法(CPM)によって解決される基礎となる最適化問題をサポートするベクター法を発表しました。CPMは、O(1/λ ε)回の反復で ε 精度の解に収束することが示されています。ここで、λ は、正則化子と損失関数の間のトレードオフパラメーターです。多くの実用的な問題に対するCPMの優れた収束率に動機付けられて、これらの率はさらに改善できると推測されました。この論文では、反例を構築することでこの推測を反証します。しかし、驚くべきことに、これらの問題は本質的に困難ではないことがさらにわかり、Nesterovの加速勾配法と組み合わせて、O* (min {1/ε, 1/√λε})回の反復で ε 精度の解を見つけることができる新しい平滑化戦略を開発しました。計算上、私たちの平滑化技術は、精度/再現率損益分岐点やROCAreaなどの多変量パフォーマンススコアを最適化するのにも特に有利です。反復あたりのコストはCPMと同じです。公開されている最大のデータセットのいくつかで実証的に評価したところ、私たちの方法は一般化能力を犠牲にすることなくCPMよりも大幅に速く収束することが示されました。

SVDFeature: A Toolkit for Feature-based Collaborative Filtering
SVDFeature: 機能ベースの協調フィルタリングのためのツールキット

In this paper we introduce SVDFeature, a machine learning toolkit for feature-based collaborative filtering. SVDFeature is designed to efficiently solve the feature-based matrix factorization. The feature-based setting allows us to build factorization models incorporating side information such as temporal dynamics, neighborhood relationship, and hierarchical information. The toolkit is capable of both rate prediction and collaborative ranking, and is carefully designed for efficient training on large-scale data set. Using this toolkit, we built solutions to win KDD Cup for two consecutive years.

この論文では、機能ベースの協調フィルタリングのための機械学習ツールキットであるSVDFeatureを紹介します。SVDFeatureは、特徴ベースの行列分解を効率的に解くように設計されています。特徴量ベースの設定により、時間ダイナミクス、近傍関係、階層情報などのサイド情報を組み込んだ因数分解モデルを構築することができます。このツールキットは、レート予測と協調的なランク付けの両方を行うことができ、大規模なデータセットで効率的にトレーニングできるように慎重に設計されています。このツールキットを使用して、KDDカップを2年連続で獲得するためのソリューションを構築しました。

Learning Symbolic Representations of Hybrid Dynamical Systems
ハイブリッド力学系の記号表現の学習

A hybrid dynamical system is a mathematical model suitable for describing an extensive spectrum of multi-modal, time-series behaviors, ranging from bouncing balls to air traffic controllers. This paper describes multi-modal symbolic regression (MMSR): a learning algorithm to construct non-linear symbolic representations of discrete dynamical systems with continuous mappings from unlabeled, time-series data. MMSR consists of two subalgorithms—clustered symbolic regression, a method to simultaneously identify distinct behaviors while formulating their mathematical expressions, and transition modeling, an algorithm to infer symbolic inequalities that describe binary classification boundaries. These subalgorithms are combined to infer hybrid dynamical systems as a collection of apt, mathematical expressions. MMSR is evaluated on a collection of four synthetic data sets and outperforms other multi-modal machine learning approaches in both accuracy and interpretability, even in the presence of noise. Furthermore, the versatility of MMSR is demonstrated by identifying and inferring classical expressions of transistor modes from recorded measurements.

ハイブリッド動的システムは、跳ねるボールから航空管制官まで、幅広いマルチモーダル時系列動作を記述するのに適した数学モデルです。この論文では、マルチモーダルシンボリック回帰(MMSR)について説明します。これは、ラベルなしの時系列データから連続マッピングを使用して、離散動的システムの非線形シンボリック表現を構築する学習アルゴリズムです。MMSRは、2つのサブアルゴリズムで構成されています。1つはクラスター化シンボリック回帰で、これは異なる動作を同時に識別しながらその数式を定式化する方法であり、もう1つは遷移モデリングです。これは、バイナリ分類境界を記述するシンボリック不等式を推測するアルゴリズムです。これらのサブアルゴリズムを組み合わせることで、ハイブリッド動的システムを適切な数式のコレクションとして推測します。MMSRは、4つの合成データセットのコレクションで評価され、ノイズが存在する場合でも、精度と解釈可能性の両方で他のマルチモーダル機械学習アプローチよりも優れています。さらに、記録された測定値からトランジスタモードの古典的な表現を識別および推論することによって、MMSRの汎用性が実証されます。

Regularized Bundle Methods for Convex and Non-Convex Risks
凸リスクと非凸リスクの正則化バンドル法

Machine learning is most often cast as an optimization problem. Ideally, one expects a convex objective function to rely on efficient convex optimizers with nice guarantees such as no local optima. Yet, non-convexity is very frequent in practice and it may sometimes be inappropriate to look for convexity at any price. Alternatively one can decide not to limit a priori the modeling expressivity to models whose learning may be solved by convex optimization and rely on non-convex optimization algorithms. The main motivation of this work is to provide efficient and scalable algorithms for non-convex optimization. We focus on regularized unconstrained optimization problems which cover a large number of modern machine learning problems such as logistic regression, conditional random fields, large margin estimation, etc. We propose a novel algorithm for minimizing a regularized objective that is able to handle convex and non-convex, smooth and non-smooth risks. The algorithm is based on the cutting plane technique and on the idea of exploiting the regularization term in the objective function. It may be thought as a limited memory extension of convex regularized bundle methods for dealing with convex and non convex risks. In case the risk is convex the algorithm is proved to converge to a stationary solution with accuracy ε with a rate O(1/λε) where λ is the regularization parameter of the objective function under the assumption of a Lipschitz empirical risk. In case the risk is not convex getting such a proof is more difficult and requires a stronger and more disputable assumption. Yet we provide experimental results on artificial test problems, and on five standard and difficult machine learning problems that are cast as convex and non-convex optimization problems that show how our algorithm compares well in practice with state of the art optimization algorithms.

機械学習は、ほとんどの場合、最適化問題として扱われます。理想的には、凸目的関数は、局所最適値がないなどの優れた保証を備えた効率的な凸最適化器に依存することが期待されます。しかし、非凸性は実際には非常に頻繁に発生し、いかなる代償を払ってでも凸性を求めることが適切でない場合があります。あるいは、学習が凸最適化によって解決できるモデルにモデリング表現力を事前に制限せず、非凸最適化アルゴリズムに依存することを決定することもできます。この研究の主な目的は、非凸最適化のための効率的でスケーラブルなアルゴリズムを提供することです。ロジスティック回帰、条件付きランダムフィールド、大マージン推定など、多数の最新の機械学習問題をカバーする正規化された制約なし最適化問題に焦点を当てています。凸および非凸、滑らかなリスクおよび滑らかでないリスクを処理できる正規化された目的関数を最小化する新しいアルゴリズムを提案します。このアルゴリズムは、切断面手法と、目的関数の正規化項を活用するというアイデアに基づいています。これは、凸リスクと非凸リスクを扱うための凸正規化バンドル法の限定メモリ拡張と考えることができます。リスクが凸の場合、アルゴリズムは、精度 ε で定常解に収束することが証明されており、収束率はO(1/λε)です。ここで、λ は、リプシッツ経験的リスクの仮定の下での目的関数の正規化パラメータです。リスクが凸でない場合、そのような証明を得るのはより困難で、より強力で議論の余地のある仮定が必要になります。しかし、私たちは人工テスト問題、および凸および非凸最適化問題として投げかけられた5つの標準的かつ困難な機械学習問題に関する実験結果を提供し、私たちのアルゴリズムが実際に最先端の最適化アルゴリズムと比べて優れていることを示しています。

DARWIN: A Framework for Machine Learning and Computer Vision Research and Development
DARWIN:機械学習とコンピュータビジョンの研究開発のためのフレームワーク

We present an open-source platform-independent C++ framework for machine learning and computer vision research. The framework includes a wide range of standard machine learning and graphical models algorithms as well as reference implementations for many machine learning and computer vision applications. The framework contains Matlab wrappers for core components of the library and an experimental graphical user interface for developing and visualizing machine learning data flows.

私たちは、機械学習とコンピュータービジョン研究のためのオープンソースのプラットフォームに依存しないC ++フレームワークを紹介します。このフレームワークには、さまざまな標準的な機械学習とグラフィカルモデル、アルゴリズム、および多くの機械学習およびコンピュータービジョンアプリケーションのリファレンス実装が含まれています。このフレームワークには、ライブラリのコアコンポーネント用のMatlabラッパーと、機械学習データフローを開発および視覚化するための実験的なグラフィカルユーザーインターフェイスが含まれています。

PAC-Bayes Bounds with Data Dependent Priors
データ依存事前確率を持つ PACベイズ境界

This paper presents the prior PAC-Bayes bound and explores its capabilities as a tool to provide tight predictions of SVMs’ generalization. The computation of the bound involves estimating a prior of the distribution of classifiers from the available data, and then manipulating this prior in the usual PAC-Bayes generalization bound. We explore two alternatives: to learn the prior from a separate data set, or to consider an expectation prior that does not need this separate data set. The prior PAC-Bayes bound motivates two SVM-like classification algorithms, prior SVM and η-prior SVM, whose regularization term pushes towards the minimization of the prior PAC-Bayes bound. The experimental work illustrates that the new bounds can be significantly tighter than the original PAC-Bayes bound when applied to SVMs, and among them the combination of the prior PAC-Bayes bound and the prior SVM algorithm gives the tightest bound.

この論文では、以前のPAC-Bayes束縛を提示し、SVMの一般化を厳密に予測するためのツールとしてのその機能について説明します。範囲の計算には、使用可能なデータから分類器の分布の事前分布を推定し、通常のPAC-Bayes一般化境界でこの事前確率を操作することが含まれます。別のデータセットから事前確率を学習するか、この個別のデータセットを必要としない期待事前確率を考慮するかの2つの選択肢を検討します。事前のPAC-ベイズ境界は、事前のSVMとηのSVMのような2つの分類アルゴリズムを動機付けます。これらの正則化項は、事前のPAC-ベイズ境界の最小化に向けて推進されます。この実験作業は、SVMに適用すると、新しい境界が元のPAC-Bayes境界よりも大幅に狭くなる可能性があり、その中で以前のPAC-Bayes境界と以前のSVMアルゴリズムの組み合わせが最も厳しい境界になることを示しています。

Fast Approximation of Matrix Coherence and Statistical Leverage
行列コヒーレンスと統計的レバレッジの高速近似

The statistical leverage scores of a matrix A are the squared row-norms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recently-popular problems such as matrix completion and Nyström-based low-rank matrix approximation as well as in large-scale statistical data analysis applications more generally; moreover, they are of interest since they define the key structural nonuniformity that must be dealt with in developing fast randomized matrix algorithms. Our main result is a randomized algorithm that takes as input an arbitrary n × d matrix A, with n >> d, and that returns as output relative-error approximations to all n of the statistical leverage scores. The proposed algorithm runs (under assumptions on the precise values of n and d) in O(n d log n) time, as opposed to the O(nd2) time required by the naïve algorithm that involves computing an orthogonal basis for the range of A. Our analysis may be viewed in terms of computing a relative-error approximation to an underconstrained least-squares approximation problem, or, relatedly, it may be viewed as an application of Johnson-Lindenstrauss type ideas. Several practically-important extensions of our basic result are also described, including the approximation of so-called cross-leverage scores, the extension of these ideas to matrices with n ≈ d, and the extension to streaming environments.

行列Aの統計的レバレッジスコアは、その(最上位の)左特異ベクトルを含む行列の行ノルムの2乗であり、コヒーレンスは最大のレバレッジスコアです。これらの量は、行列補完やNyströmベースの低ランク行列近似などの最近人気の問題だけでなく、より一般的には大規模な統計データ解析アプリケーションでも重要です。さらに、高速ランダム化行列アルゴリズムの開発で対処する必要がある重要な構造的非均一性を定義するため、重要です。私たちの主な結果は、任意のn×d行列A (n >> d)を入力として受け取り、すべてのn個の統計的レバレッジスコアの相対誤差近似を出力として返すランダム化アルゴリズムです。提案されたアルゴリズムは、(nとdの正確な値に関する仮定の下で) O(n d log n)時間で実行されます。これは、Aの範囲の直交基底を計算する単純なアルゴリズムで必要なO(nd2)時間とは異なります。私たちの分析は、制約不足の最小二乗近似問題に対する相対誤差近似を計算するという観点から見ることができます。または、関連して、ジョンソン-リンデンシュトラウス型のアイデアの応用として見ることができます。いわゆるクロスレバレッジスコアの近似、これらのアイデアのn≈dの行列への拡張、ストリーミング環境への拡張など、私たちの基本的な結果のいくつかの実用的に重要な拡張についても説明します。

Iterative Reweighted Algorithms for Matrix Rank Minimization
行列ランクの最小化のための反復再重み付けアルゴリズム

The problem of minimizing the rank of a matrix subject to affine constraints has applications in several areas including machine learning, and is known to be NP-hard. A tractable relaxation for this problem is nuclear norm (or trace norm) minimization, which is guaranteed to find the minimum rank matrix under suitable assumptions. In this paper, we propose a family of Iterative Reweighted Least Squares algorithms IRLS-p (with 0 ≤ p ≤ 1), as a computationally efficient way to improve over the performance of nuclear norm minimization. The algorithms can be viewed as (locally) minimizing certain smooth approximations to the rank function. When p=1, we give theoretical guarantees similar to those for nuclear norm minimization, that is, recovery of low-rank matrices under certain assumptions on the operator defining the constraints. For p < 1, IRLS-p shows better empirical performance in terms of recovering low-rank matrices than nuclear norm minimization. We provide an efficient implementation for IRLS-p, and also present a related family of algorithms, sIRLS-p. These algorithms exhibit competitive run times and improved recovery when compared to existing algorithms for random instances of the matrix completion problem, as well as on the MovieLens movie recommendation data set.

アフィン制約の対象となる行列のランクを最小化する問題は、機械学習を含むいくつかの分野に応用されており、NP困難であることが知られています。この問題の扱いやすい緩和策は、適切な仮定の下で最小ランク行列を見つけることが保証されている核ノルム（またはトレースノルム）最小化です。この論文では、核ノルム最小化のパフォーマンスを向上させる計算効率の高い方法として、反復再加重最小二乗アルゴリズムIRLS-p（0≤p≤1）のファミリを提案します。このアルゴリズムは、ランク関数の特定の滑らかな近似を（局所的に）最小化するものと見ることができます。p = 1の場合、核ノルム最小化の場合と同様の理論的保証、つまり、制約を定義する演算子に関する特定の仮定の下での低ランク行列の回復を提供します。p < 1の場合、低ランク行列の回復に関して、IRLS-pは核ノルム最小化よりも優れた経験的パフォーマンスを示します。IRLS-pの効率的な実装を提供し、関連するアルゴリズムファミリであるsIRLS-pも紹介します。これらのアルゴリズムは、行列補完問題のランダムインスタンスやMovieLens映画推奨データセットにおいて、既存のアルゴリズムと比較して競争力のある実行時間と改善された回復を示します。

Learning Linear Cyclic Causal Models with Latent Variables
潜在変数を用いた線形巡回因果モデルの学習

Identifying cause-effect relationships between variables of interest is a central problem in science. Given a set of experiments we describe a procedure that identifies linear models that may contain cycles and latent variables. We provide a detailed description of the model family, full proofs of the necessary and sufficient conditions for identifiability, a search algorithm that is complete, and a discussion of what can be done when the identifiability conditions are not satisfied. The algorithm is comprehensively tested in simulations, comparing it to competing algorithms in the literature. Furthermore, we adapt the procedure to the problem of cellular network inference, applying it to the biologically realistic data of the DREAM challenges. The paper provides a full theoretical foundation for the causal discovery procedure first presented by Eberhardt et al. (2010) and Hyttinen et al. (2010).

関心のある変数間の因果関係を特定することは、科学の中心的な問題です。一連の実験が与えられた場合、サイクルと潜在変数を含む可能性のある線形モデルを特定する手順について説明します。モデルファミリーの詳細な説明、識別可能性の必要条件と十分条件の完全な証明、完全な検索アルゴリズム、および識別可能性の条件が満たされない場合に何ができるかについての議論を提供します。このアルゴリズムは、シミュレーションで包括的にテストされ、文献の競合アルゴリズムと比較されます。さらに、この手順をセルラーネットワーク推論の問題に適応させ、DREAMチャレンジの生物学的に現実的なデータに適用します。この論文では、Eberhardtら(2010)とHyttinenら(2010)によって最初に提示された因果的発見手順の完全な理論的基盤を提供します。

Sparse and Unique Nonnegative Matrix Factorization Through Data Preprocessing
データ前処理によるスパースで一意な非負行列の因数分解

Nonnegative matrix factorization (NMF) has become a very popular technique in machine learning because it automatically extracts meaningful features through a sparse and part-based representation. However, NMF has the drawback of being highly ill-posed, that is, there typically exist many different but equivalent factorizations. In this paper, we introduce a completely new way to obtaining more well-posed NMF problems whose solutions are sparser. Our technique is based on the preprocessing of the nonnegative input data matrix, and relies on the theory of M-matrices and the geometric interpretation of NMF. This approach provably leads to optimal and sparse solutions under the separability assumption of Donoho and Stodden (2003), and, for rank-three matrices, makes the number of exact factorizations finite. We illustrate the effectiveness of our technique on several image data sets.

非負行列因数分解(NMF)は、スパースで部分ベースの表現を通じて意味のある特徴を自動的に抽出するため、機械学習で非常に一般的な手法になっています。ただし、NMFには、非常に不利な問題があるという欠点があり、つまり、通常、多くの異なるが同等の因数分解が存在するということです。この論文では、解がまばらな、より適切に提起されたNMF問題を取得するためのまったく新しい方法を紹介します。私たちの手法は、非負の入力データ行列の前処理に基づいており、M行列の理論とNMFの幾何学的解釈に依存しています。このアプローチは、Donoho and Stodden (2003)の分離可能性の仮定の下で最適解とスパース解を導くことが証明され、ランク3行列の場合、厳密分解の数が有限になります。この手法の有効性を、いくつかの画像データセットで説明します。

Large-scale Linear Support Vector Regression
大規模線形サポートベクトル回帰

Support vector regression (SVR) and support vector classification (SVC) are popular learning techniques, but their use with kernels is often time consuming. Recently, linear SVC without kernels has been shown to give competitive accuracy for some applications, but enjoys much faster training/testing. However, few studies have focused on linear SVR. In this paper, we extend state-of-the-art training methods for linear SVC to linear SVR. We show that the extension is straightforward for some methods, but is not trivial for some others. Our experiments demonstrate that for some problems, the proposed linear-SVR training methods can very efficiently produce models that are as good as kernel SVR.

サポートベクトル回帰(SVR)とサポートベクトル分類(SVC)は一般的な学習手法ですが、カーネルでの使用には時間がかかることがよくあります。最近、カーネルを使用しない線形SVCは、一部のアプリケーションで競争力のある精度を提供することが示されていますが、トレーニング/テストははるかに高速です。しかし、線形SVRに焦点を当てた研究はほとんどありません。この論文では、線形SVCの最先端の学習方法を線形SVRに拡張します。この拡張機能は、一部の方法では簡単ですが、他のメソッドでは簡単ではないことを示します。私たちの実験では、いくつかの問題について、提案された線形SVRトレーニング方法がカーネルSVRと同等のモデルを非常に効率的に生成できることを示しています。

Human Gesture Recognition on Product Manifolds
製品多様体での人間のジェスチャー認識

Action videos are multidimensional data and can be naturally represented as data tensors. While tensor computing is widely used in computer vision, the geometry of tensor space is often ignored. The aim of this paper is to demonstrate the importance of the intrinsic geometry of tensor space which yields a very discriminating structure for action recognition. We characterize data tensors as points on a product manifold and model it statistically using least squares regression. To this aim, we factorize a data tensor relating to each order of the tensor using Higher Order Singular Value Decomposition (HOSVD) and then impose each factorized element on a Grassmann manifold. Furthermore, we account for underlying geometry on manifolds and formulate least squares regression as a composite function. This gives a natural extension from Euclidean space to manifolds. Consequently, classification is performed using geodesic distance on a product manifold where each factor manifold is Grassmannian. Our method exploits appearance and motion without explicitly modeling the shapes and dynamics. We assess the proposed method using three gesture databases, namely the Cambridge hand-gesture, the UMD Keck body-gesture, and the CHALEARN gesture challenge data sets. Experimental results reveal that not only does the proposed method perform well on the standard benchmark data sets, but also it generalizes well on the one-shot-learning gesture challenge. Furthermore, it is based on a simple statistical model and the intrinsic geometry of tensor space.

アクションビデオは多次元データであり、データテンソルとして自然に表現できます。テンソルコンピューティングはコンピュータービジョンで広く使用されていますが、テンソル空間のジオメトリは無視されることがよくあります。この論文の目的は、アクション認識のための非常に識別力のある構造を生み出すテンソル空間の固有のジオメトリの重要性を示すことです。データテンソルを積多様体上の点として特徴付け、最小二乗回帰を使用して統計的にモデル化します。この目的のために、高次特異値分解(HOSVD)を使用してテンソルの各次数に関連するデータテンソルを因数分解し、因数分解された各要素をグラスマン多様体に課します。さらに、多様体の基礎となるジオメトリを考慮し、最小二乗回帰を合成関数として定式化します。これにより、ユークリッド空間から多様体への自然な拡張が実現します。その結果、各因子多様体がグラスマン多様体である積多様体上の測地線距離を使用して分類が実行されます。私たちの方法は、形状とダイナミクスを明示的にモデル化することなく、外観と動きを活用します。ケンブリッジハンドジェスチャー、UMDケックボディジェスチャー、およびCHALEARNジェスチャーチャレンジデータセットの3つのジェスチャーデータベースを使用して、提案された方法を評価します。実験結果から、提案された方法は標準ベンチマークデータセットで優れたパフォーマンスを発揮するだけでなく、ワンショット学習ジェスチャーチャレンジでも適切に一般化できることが明らかになりました。さらに、この方法は単純な統計モデルとテンソル空間の固有の幾何学に基づいています。

Linear Fitted-Q Iteration with Multiple Reward Functions
複数の報酬関数による線形 Fitted-Q 反復

We present a general and detailed development of an algorithm for finite-horizon fitted-Q iteration with an arbitrary number of reward signals and linear value function approximation using an arbitrary number of state features. This includes a detailed treatment of the 3-reward function case using triangulation primitives from computational geometry and a method for identifying globally dominated actions. We also present an example of how our methods can be used to construct a real-world decision aid by considering symptom reduction, weight gain, and quality of life in sequential treatments for schizophrenia. Finally, we discuss future directions in which to take this work that will further enable our methods to make a positive impact on the field of evidence-based clinical decision support.

私たちは、任意の数の状態特徴を使用した任意の数の報酬信号と線形値関数近似を使用した有限地平線適合Q反復のアルゴリズムの一般的かつ詳細な開発を示します。これには、計算幾何学からの三角測量プリミティブを使用した3報酬関数ケースの詳細な取り扱いと、グローバルに支配されたアクションを特定する方法が含まれます。また、統合失調症の逐次治療における症状の軽減、体重増加、および生活の質を考慮することにより、私たちの方法を使用して現実世界の意思決定支援を構築する方法の例も示します。最後に、エビデンスに基づく臨床意思決定支援の分野にプラスの影響を与えるために、私たちの方法をさらに可能にするこの研究の将来の方向性について説明します。

Sally: A Tool for Embedding Strings in Vector Spaces
Sally: ベクトル空間に文字列を埋め込むためのツール

Strings and sequences are ubiquitous in many areas of data analysis. However, only few learning methods can be directly applied to this form of data. We present Sally, a tool for embedding strings in vector spaces that allows for applying a wide range of learning methods to string data. Sally implements a generalized form of the bag-of-words model, where strings are mapped to a vector space that is spanned by a set of string features, such as words or n-grams of words. The implementation of Sally builds on efficient string algorithms and enables processing millions of strings and features. The tool supports several data formats and is capable of interfacing with common learning environments, such as Weka, Shogun, Matlab, or Pylab. Sally has been successfully applied for learning with natural language text, DNA sequences and monitored program behavior.

文字列とシーケンスは、データ分析の多くの分野で遍在しています。ただし、この形式のデータに直接適用できる学習方法はごくわずかです。Sallyは、ベクトル空間に文字列を埋め込むためのツールで、文字列データに幅広い学習方法を適用できるツールです。Sallyは、単語のn-gramなど、一連の文字列特徴が広がるベクトル空間に文字列がマップされる、bag-of-wordsモデルの一般化された形式を実装します。Sallyの実装は、効率的な文字列アルゴリズムに基づいて構築されており、何百万もの文字列と機能を処理できます。このツールはいくつかのデータ形式をサポートしており、Weka、Shogun、Matlab、Pylabなどの一般的な学習環境とインターフェースすることができます。サリーは、自然言語テキスト、DNA配列、および監視されたプログラムの動作による学習にうまく適用されています。

Dynamic Policy Programming
動的ポリシープログラミング

In this paper, we propose a novel policy iteration method, called dynamic policy programming (DPP), to estimate the optimal policy in the infinite-horizon Markov decision processes. DPP is an incremental algorithm that forces a gradual change in policy update. This allows us to prove finite-iteration and asymptotic l∞-norm performance-loss bounds in the presence of approximation/estimation error which depend on the average accumulated error as opposed to the standard bounds which are expressed in terms of the supremum of the errors. The dependency on the average error is important in problems with limited number of samples per iteration, for which the average of the errors can be significantly smaller in size than the supremum of the errors. Based on these theoretical results, we prove that a sampling-based variant of DPP (DPP-RL) asymptotically converges to the optimal policy. Finally, we illustrate numerically the applicability of these results on some benchmark problems and compare the performance of the approximate variants of DPP with some existing reinforcement learning (RL) methods.

この論文では、無限時間マルコフ決定過程における最適ポリシーを推定するための、動的ポリシープログラミング(DPP)と呼ばれる新しいポリシー反復法を提案します。DPPは、ポリシー更新を徐々に変更する増分アルゴリズムです。これにより、誤差の上限で表される標準的な境界ではなく、平均累積誤差に依存する近似/推定誤差がある場合の有限反復および漸近的なl∞ ノルム性能損失境界を証明できます。平均誤差への依存性は、反復あたりのサンプル数が限られている問題で重要であり、その場合、誤差の平均は誤差の上限よりも大幅に小さくなる可能性があります。これらの理論的結果に基づいて、DPPのサンプリングベースのバリアント(DPP-RL)が漸近的に最適ポリシーに収束することを証明しています。最後に、いくつかのベンチマーク問題に対するこれらの結果の適用可能性を数値的に示し、DPPの近似バリアントのパフォーマンスをいくつかの既存の強化学習(RL)手法と比較します。

Quantum Set Intersection and its Application to Associative Memory
量子集合交差とその連想記憶への応用

We describe a quantum algorithm for computing the intersection of two sets and its application to associative memory. The algorithm is based on a modification of Grover’s quantum search algorithm (Grover, 1996). We present algorithms for pattern retrieval, pattern completion, and pattern correction. We show that the quantum associative memory can store an exponential number of memories and retrieve them in sub-exponential time. We prove that this model has advantages over known classical associative memories as well as previously proposed quantum models.

私たちは、2つのセットの交差を計算するための量子アルゴリズムと、その連想メモリへの適用について説明します。このアルゴリズムは、グローバーの量子探索アルゴリズム(Grover, 1996)の修正に基づいています。パターン検索、パターン補完、およびパターン修正のアルゴリズムを紹介します。量子連想メモリが指数関数的な数のメモリを格納し、それらを亜指数関数的な時間で取得できることを示します。このモデルが、既知の古典的連想記憶や以前に提案された量子モデルよりも優れていることを証明します。

Bayesian Mixed-Effects Inference on Classification Performance in Hierarchical Data Sets
階層データセットにおける分類性能に対するベイズ混合効果推論

Classification algorithms are frequently used on data with a natural hierarchical structure. For instance, classifiers are often trained and tested on trial-wise measurements, separately for each subject within a group. One important question is how classification outcomes observed in individual subjects can be generalized to the population from which the group was sampled. To address this question, this paper introduces novel statistical models that are guided by three desiderata. First, all models explicitly respect the hierarchical nature of the data, that is, they are mixed-effects models that simultaneously account for within-subjects (fixed-effects) and across-subjects (random-effects) variance components. Second, maximum-likelihood estimation is replaced by full Bayesian inference in order to enable natural regularization of the estimation problem and to afford conclusions in terms of posterior probability statements. Third, inference on classification accuracy is complemented by inference on the balanced accuracy, which avoids inflated accuracy estimates for imbalanced data sets. We introduce hierarchical models that satisfy these criteria and demonstrate their advantages over conventional methods using MCMC implementations for model inversion and model selection on both synthetic and empirical data. We envisage that our approach will improve the sensitivity and validity of statistical inference in future hierarchical classification studies.

分類アルゴリズムは、自然な階層構造を持つデータでよく使用されます。たとえば、分類器は、グループ内の各被験者に対して個別に、試行ごとの測定でトレーニングおよびテストされることがよくあります。重要な質問の1つは、個々の被験者で観察された分類結果を、グループがサンプリングされた集団にどのように一般化できるかということです。この質問に対処するために、この論文では、3つの要件によって導かれる新しい統計モデルを紹介します。まず、すべてのモデルは、データの階層的性質を明示的に尊重します。つまり、被験者内(固定効果)と被験者間(ランダム効果)の分散成分を同時に考慮する混合効果モデルです。次に、最大尤度推定は、推定問題の自然な正規化を可能にし、事後確率ステートメントの観点から結論を出すために、完全なベイズ推論に置き換えられます。3番目に、分類精度の推論は、不均衡なデータセットに対する過大な精度推定を回避するバランス精度の推論によって補完されます。私たちは、これらの基準を満たす階層モデルを紹介し、合成データと経験的データの両方でモデル反転とモデル選択にMCMC実装を使用して、従来の方法よりも優れていることを実証します。私たちのアプローチにより、将来の階層分類研究における統計的推論の感度と妥当性が向上すると予想しています。

Breaking the Curse of Kernelization: Budgeted Stochastic Gradient Descent for Large-Scale SVM Training
カーネル化の呪縛を打破する:大規模SVMトレーニングのための予算化された確率的勾配降下法

Online algorithms that process one example at a time are advantageous when dealing with very large data or with data streams. Stochastic Gradient Descent (SGD) is such an algorithm and it is an attractive choice for online Support Vector Machine (SVM) training due to its simplicity and effectiveness. When equipped with kernel functions, similarly to other SVM learning algorithms, SGD is susceptible to the curse of kernelization that causes unbounded linear growth in model size and update time with data size. This may render SGD inapplicable to large data sets. We address this issue by presenting a class of Budgeted SGD (BSGD) algorithms for large-scale kernel SVM training which have constant space and constant time complexity per update. Specifically, BSGD keeps the number of support vectors bounded during training through several budget maintenance strategies. We treat the budget maintenance as a source of the gradient error, and show that the gap between the BSGD and the optimal SVM solutions depends on the model degradation due to budget maintenance. To minimize the gap, we study greedy budget maintenance methods based on removal, projection, and merging of support vectors. We propose budgeted versions of several popular online SVM algorithms that belong to the SGD family. We further derive BSGD algorithms for multi-class SVM training. Comprehensive empirical results show that BSGD achieves higher accuracy than the state-of-the-art budgeted online algorithms and comparable to non-budget algorithms, while achieving impressive computational efficiency both in time and space during training and prediction.

一度に1つの例を処理するオンラインアルゴリズムは、非常に大きなデータやデータストリームを処理する場合に有利です。確率的勾配降下法(SGD)はそのようなアルゴリズムであり、そのシンプルさと有効性から、オンラインサポートベクターマシン(SVM)トレーニングの魅力的な選択肢です。カーネル関数を装備すると、他のSVM学習アルゴリズムと同様に、SGDはカーネル化の呪いの影響を受けやすく、モデルサイズと更新時間がデータサイズとともに無制限に線形に増加します。これにより、SGDは大規模なデータセットには適用できなくなる可能性があります。この問題に対処するために、大規模なカーネルSVMトレーニング用のBudgeted SGD (BSGD)アルゴリズムのクラスを提示します。このアルゴリズムは、更新ごとに一定の空間と一定の時間の計算量を持ちます。具体的には、BSGDは、いくつかの予算維持戦略を通じて、トレーニング中にサポートベクターの数を制限します。予算維持を勾配エラーの原因として扱い、BSGDと最適なSVMソリューションのギャップは、予算維持によるモデルの劣化に依存することを示します。このギャップを最小限に抑えるために、サポートベクターの削除、投影、およびマージに基づく貪欲な予算管理方法を研究します。SGDファミリーに属するいくつかの一般的なオンラインSVMアルゴリズムの予算バージョンを提案します。さらに、マルチクラスSVMトレーニング用のBSGDアルゴリズムを導出します。包括的な実験結果によると、BSGDは最先端の予算オンラインアルゴリズムよりも高い精度を達成し、非予算アルゴリズムに匹敵する一方で、トレーニングと予測中に時間と空間の両方で優れた計算効率を達成します。

Discriminative Hierarchical Part-based Models for Human Parsing and Action Recognition
人間による構文解析と行動認識のための判別的階層部分ベースモデル

We consider the problem of parsing human poses and recognizing their actions in static images with part-based models. Most previous work in part-based models only considers rigid parts (e.g., torso, head, half limbs) guided by human anatomy. We argue that this representation of parts is not necessarily appropriate. In this paper, we introduce hierarchical poselets—a new representation for modeling the pose configuration of human bodies. Hierarchical poselets can be rigid parts, but they can also be parts that cover large portions of human bodies (e.g., torso + left arm). In the extreme case, they can be the whole bodies. The hierarchical poselets are organized in a hierarchical way via a structured model. Human parsing can be achieved by inferring the optimal labeling of this hierarchical model. The pose information captured by this hierarchical model can also be used as a intermediate representation for other high-level tasks. We demonstrate it in action recognition from static images.

私たちは、パーツベースのモデルを使用して、静止画像における人間のポーズを解析し、その動作を認識する問題について検討します。これまでのパーツベースモデルのほとんどの研究では、人体の解剖学に基づいた硬いパーツ（胴体、頭、半手足など）のみを考慮しています。私たちは、パーツのこの表現は必ずしも適切ではないと主張します。この論文では、人体のポーズ構成をモデル化するための新しい表現である階層的ポーズレットを紹介します。階層的ポーズレットは硬いパーツにすることもできますが、人体の大部分をカバーするパーツ（胴体+左腕など）にすることもできます。極端な場合には、全身になることもあります。階層的ポーズレットは、構造化モデルを介して階層的に編成されます。人間の解析は、この階層モデルの最適なラベル付けを推論することで実現できます。この階層モデルによってキャプチャされたポーズ情報は、他の高レベルタスクの中間表現としても使用できます。静止画像からの動作認識でこれを実証します。

Finite-Sample Analysis of Least-Squares Policy Iteration
最小二乗方策反復の有限サンプル分析

In this paper, we report a performance bound for the widely used least-squares policy iteration (LSPI) algorithm. We first consider the problem of policy evaluation in reinforcement learning, that is, learning the value function of a fixed policy, using the least-squares temporal-difference (LSTD) learning method, and report finite-sample analysis for this algorithm. To do so, we first derive a bound on the performance of the LSTD solution evaluated at the states generated by the Markov chain and used by the algorithm to learn an estimate of the value function. This result is general in the sense that no assumption is made on the existence of a stationary distribution for the Markov chain. We then derive generalization bounds in the case when the Markov chain possesses a stationary distribution and is β-mixing. Finally, we analyze how the error at each policy evaluation step is propagated through the iterations of a policy iteration method, and derive a performance bound for the LSPI algorithm.

この論文では、広く使用されている最小二乗ポリシー反復(LSPI)アルゴリズムのパフォーマンス境界を報告します。まず、強化学習におけるポリシー評価の問題、つまり、最小二乗時間差分(LSTD)学習法を使用して固定ポリシーの価値関数を学習する問題について検討し、このアルゴリズムの有限サンプル分析を報告します。そのために、まず、マルコフ連鎖によって生成された状態で評価され、価値関数の推定値を学習するためにアルゴリズムによって使用されるLSTDソリューションのパフォーマンスの境界を導出します。この結果は、マルコフ連鎖の定常分布の存在について仮定しないという意味で一般的なものです。次に、マルコフ連鎖が定常分布を持ち、β 混合である場合の一般化境界を導出します。最後に、各ポリシー評価ステップでの誤差がポリシー反復法の反復を通じてどのように伝播するかを分析し、LSPIアルゴリズムのパフォーマンス境界を導出します。

Multi-Instance Learning with Any Hypothesis Class
任意の仮説クラスを使用したマルチインスタンス学習

In the supervised learning setting termed Multiple-Instance Learning (MIL), the examples are bags of instances, and the bag label is a function of the labels of its instances. Typically, this function is the Boolean OR. The learner observes a sample of bags and the bag labels, but not the instance labels that determine the bag labels. The learner is then required to emit a classification rule for bags based on the sample. MIL has numerous applications, and many heuristic algorithms have been used successfully on this problem, each adapted to specific settings or applications. In this work we provide a unified theoretical analysis for MIL, which holds for any underlying hypothesis class, regardless of a specific application or problem domain. We show that the sample complexity of MIL is only poly-logarithmically dependent on the size of the bag, for any underlying hypothesis class. In addition, we introduce a new PAC-learning algorithm for MIL, which uses a regular supervised learning algorithm as an oracle. We prove that efficient PAC-learning for MIL can be generated from any efficient non-MIL supervised learning algorithm that handles one-sided error. The computational complexity of the resulting algorithm is only polynomially dependent on the bag size.

マルチインスタンス学習(MIL)と呼ばれる教師あり学習設定では、例はインスタンスのバッグであり、バッグラベルはそのインスタンスのラベルの関数です。通常、この関数はブールORです。学習者はバッグのサンプルとバッグラベルを観察しますが、バッグラベルを決定するインスタンスラベルは観察しません。学習者は、サンプルに基づいてバッグの分類ルールを発行する必要があります。MILには多数のアプリケーションがあり、多くのヒューリスティックアルゴリズムがこの問題で効果的に使用されており、それぞれが特定の設定またはアプリケーションに適応しています。この研究では、特定のアプリケーションまたは問題ドメインに関係なく、基礎となる仮説クラスに当てはまるMILの統一された理論分析を提供します。MILのサンプル複雑度は、基礎となる仮説クラスに関係なく、バッグのサイズにのみ多重対数的に依存することを示します。さらに、通常の教師あり学習アルゴリズムをオラクルとして使用する、MILの新しいPAC学習アルゴリズムを紹介します。MIL用の効率的なPAC学習は、片側エラーを処理する任意の効率的な非MIL教師あり学習アルゴリズムから生成できることを証明します。結果として得られるアルゴリズムの計算の複雑さは、バッグのサイズに多項式的にのみ依存します。

Oger: Modular Learning Architectures For Large-Scale Sequential Processing
Oger: 大規模シーケンシャル処理のためのモジュール式学習アーキテクチャ

Oger (OrGanic Environment for Reservoir computing) is a Python toolbox for building, training and evaluating modular learning architectures on large data sets. It builds on MDP for its modularity, and adds processing of sequential data sets, gradient descent training, several cross-validation schemes and parallel parameter optimization methods. Additionally, several learning algorithms are implemented, such as different reservoir implementations (both sigmoid and spiking), ridge regression, conditional restricted Boltzmann machine (CRBM) and others, including GPU accelerated versions. Oger is released under the GNU LGPL, and is available from http://organic.elis.ugent.be/oger.

Oger (OrGanic Environment for Reservoir computing)は、大規模なデータセットでモジュール式学習アーキテクチャを構築、トレーニング、評価するためのPythonツールボックスです。これは、そのモジュール性のためにMDPに基づいて構築されており、シーケンシャルデータセットの処理、勾配降下トレーニング、いくつかの交差検証スキーム、および並列パラメータ最適化方法が追加されています。さらに、さまざまなリザーバー実装(シグモイドとスパイキングの両方)、リッジ回帰、条件付き制限ボルツマンマシン(CRBM)など、GPUアクセラレーションバージョンを含むいくつかの学習アルゴリズムが実装されています。OgerはGNU LGPLの下でリリースされており、http://organic.elis.ugent.be/ogerから入手できます。

Facilitating Score and Causal Inference Trees for Large Observational Studies
大規模な観察研究のためのスコアと因果推論木の促進

Assessing treatment effects in observational studies is a multifaceted problem that not only involves heterogeneous mechanisms of how the treatment or cause is exposed to subjects, known as propensity, but also differential causal effects across sub-populations. We introduce a concept termed the facilitating score to account for both the confounding and interacting impacts of covariates on the treatment effect. Several approaches for estimating the facilitating score are discussed. In particular, we put forward a machine learning method, called causal inference tree (CIT), to provide a piecewise constant approximation of the facilitating score. With interpretable rules, CIT splits data in such a way that both the propensity and the treatment effect become more homogeneous within each resultant partition. Causal inference at different levels can be made on the basis of CIT. Together with an aggregated grouping procedure, CIT stratifies data into strata where causal effects can be conveniently assessed within each. Besides, a feasible way of predicting individual causal effects (ICE) is made available by aggregating ensemble CIT models. Both the stratified results and the estimated ICE provide an assessment of heterogeneity of causal effects and can be integrated for estimating the average causal effect (ACE). Mean square consistency of CIT is also established. We evaluate the performance of proposed methods with simulations and illustrate their use with the NSW data in Dehejia and Wahba (1999) where the objective is to assess the impact of a labor training program, the National Supported Work (NSW) demonstration, on post-intervention earnings.

観察研究における治療効果の評価は、傾向として知られる、治療または原因が被験者にどのようにさらされるかという異質なメカニズムだけでなく、サブ集団間での異なる因果効果も含む多面的な問題です。治療効果に対する共変量の交絡および相互作用の影響の両方を説明するために、促進スコアと呼ばれる概念を導入します。促進スコアを推定するためのいくつかのアプローチについて説明します。特に、促進スコアの区分定数近似値を提供するための因果推論ツリー(CIT)と呼ばれる機械学習手法を提案します。解釈可能なルールを使用して、CITは、傾向と治療効果の両方が各結果パーティション内でより均質になるようにデータを分割します。さまざまなレベルでの因果推論は、CITに基づいて行うことができます。集約されたグループ化手順とともに、CITはデータを層に階層化し、各層内で因果効果を簡単に評価できるようにします。さらに、アンサンブルCITモデルを集約することで、個々の因果効果(ICE)を予測する実行可能な方法が利用可能になります。層別結果と推定されたICEの両方が因果効果の異質性の評価を提供し、平均因果効果(ACE)を推定するために統合できます。CITの平均二乗一貫性も確立されています。シミュレーションで提案された方法のパフォーマンスを評価し、DehejiaとWahba (1999)のNSWデータでの使用例を示します。その目的は、労働訓練プログラムであるNational Supported Work (NSW)デモンストレーションが介入後の収入に与える影響を評価することです。

Efficient Methods for Robust Classification Under Uncertainty in Kernel Matrices
カーネル行列の不確かさ下でのロバストな分類のための効率的な方法

In this paper we study the problem of designing SVM classifiers when the kernel matrix, K, is affected by uncertainty. Specifically K is modeled as a positive affine combination of given positive semi definite kernels, with the coefficients ranging in a norm-bounded uncertainty set. We treat the problem using the Robust Optimization methodology. This reduces the uncertain SVM problem into a deterministic conic quadratic problem which can be solved in principle by a polynomial time Interior Point (IP) algorithm. However, for large-scale classification problems, IP methods become intractable and one has to resort to first-order gradient type methods. The strategy we use here is to reformulate the robust counterpart of the uncertain SVM problem as a saddle point problem and employ a special gradient scheme which works directly on the convex-concave saddle function. The algorithm is a simplified version of a general scheme due to Juditski and Nemirovski (2011). It achieves an O(1/T2) reduction of the initial error after T iterations. A comprehensive empirical study on both synthetic data and real-world protein structure data sets show that the proposed formulations achieve the desired robustness, and the saddle point based algorithm outperforms the IP method significantly.

この論文では、カーネル行列Kが不確実性によって影響を受ける場合のSVM分類器の設計の問題を検討します。具体的には、Kは、係数がノルム境界の不確実性セットの範囲内にある、与えられた正の半定値カーネルの正のアフィン結合としてモデル化されます。この問題は、ロバスト最適化手法を使用して処理します。これにより、不確実なSVMの問題が、原理的には多項式時間の内点法(IP)アルゴリズムによって解決できる決定論的な円錐二次問題に簡略化されます。ただし、大規模な分類問題の場合、IP法は扱いにくくなり、1次勾配型法に頼らざるを得なくなります。ここで使用する戦略は、不確実なSVM問題のロバストな対応を鞍点問題として再定式化し、凸凹鞍関数に直接作用する特別な勾配方式を使用することです(2011)。このアルゴリズムは、JuditskiとNemirovski (2011)による一般的な方式の簡略版です。T回の反復後、初期エラーをO(1/T2)削減します。合成データと現実世界のタンパク質構造データセットの両方に関する包括的な実証的研究により、提案された定式化が望ましい堅牢性を達成し、鞍点ベースのアルゴリズムがIP法を大幅に上回ることが明らかになりました。

Online Submodular Minimization
オンライン・サブモジュラ最小化

We consider an online decision problem over a discrete space in which the loss function is submodular. We give algorithms which are computationally efficient and are Hannan-consistent in both the full information and partial feedback settings.

私たちは、損失関数がサブモジュラである離散空間上のオンライン決定問題を考えます。計算効率が高く、完全な情報設定と部分的なフィードバック設定の両方でHannanと一貫性のあるアルゴリズムを提供します。

Local and Global Scaling Reduce Hubs in Space
ローカルおよびグローバルのスケーリングにより、宇宙のハブを削減

‘Hubness’ has recently been identified as a general problem of high dimensional data spaces, manifesting itself in the emergence of objects, so-called hubs, which tend to be among the k nearest neighbors of a large number of data items. As a consequence many nearest neighbor relations in the distance space are asymmetric, that is, object y is amongst the nearest neighbors of x but not vice versa. The work presented here discusses two classes of methods that try to symmetrize nearest neighbor relations and investigates to what extent they can mitigate the negative effects of hubs. We evaluate local distance scaling and propose a global variant which has the advantage of being easy to approximate for large data sets and of having a probabilistic interpretation. Both local and global approaches are shown to be effective especially for high-dimensional data sets, which are affected by high hubness. Both methods lead to a strong decrease of hubness in these data sets, while at the same time improving properties like classification accuracy. We evaluate the methods on a large number of public machine learning data sets and synthetic data. Finally we present a real-world application where we are able to achieve significantly higher retrieval quality.

「ハブネス」は、高次元データ空間の一般的な問題として最近認識され、多数のデータ項目のk近傍に含まれる傾向があるオブジェクト、いわゆるハブの出現として現れます。その結果、距離空間における多くの近傍関係は非対称です。つまり、オブジェクトyはxの近傍にありますが、その逆は成り立ちません。ここで紹介する作業では、近傍関係を対称化しようとする2つのクラスの方法について説明し、ハブの悪影響をどの程度軽減できるかを調査します。ローカル距離スケーリングを評価し、大規模なデータセットの近似が容易で、確率的解釈ができるという利点があるグローバルバリアントを提案します。ローカルアプローチとグローバルアプローチの両方が、特にハブネスの影響を受ける高次元データセットに効果的であることが示されています。どちらの方法も、これらのデータセットのハブネスを大幅に減少させ、同時に分類精度などの特性を改善します。多数の公開機械学習データセットと合成データでこれらの方法を評価します。最後に、大幅に高い検索品質を実現できる実際のアプリケーションを紹介します。

A Unified View of Performance Metrics: Translating Threshold Choice into Expected Classification Loss
パフォーマンスメトリクスの統合ビュー:しきい値選択を予想分類損失に変換

Many performance metrics have been introduced in the literature for the evaluation of classification performance, each of them with different origins and areas of application. These metrics include accuracy, unweighted accuracy, the area under the ROC curve or the ROC convex hull, the mean absolute error and the Brier score or mean squared error (with its decomposition into refinement and calibration). One way of understanding the relations among these metrics is by means of variable operating conditions (in the form of misclassification costs and/or class distributions). Thus, a metric may correspond to some expected loss over different operating conditions. One dimension for the analysis has been the distribution for this range of operating conditions, leading to some important connections in the area of proper scoring rules. We demonstrate in this paper that there is an equally important dimension which has so far received much less attention in the analysis of performance metrics. This dimension is given by the decision rule, which is typically implemented as a threshold choice method when using scoring models. In this paper, we explore many old and new threshold choice methods: fixed, score-uniform, score-driven, rate-driven and optimal, among others. By calculating the expected loss obtained with these threshold choice methods for a uniform range of operating conditions we give clear interpretations of the 0-1 loss, the absolute error, the Brier score, the AUC and the refinement loss respectively. Our analysis provides a comprehensive view of performance metrics as well as a systematic approach to loss minimisation which can be summarised as follows: given a model, apply the threshold choice methods that correspond with the available information about the operating condition, and compare their expected losses. In order to assist in this procedure we also derive several connections between the aforementioned performance metrics, and we highlight the role of calibration in choosing the threshold choice method.

分類性能を評価するための多くの性能指標が文献で紹介されており、それぞれ起源と適用分野が異なります。これらの指標には、精度、重み付けなしの精度、ROC曲線またはROC凸包の下の領域、平均絶対誤差、およびブライアスコアまたは平均二乗誤差（その改良と較正への分解を含む）が含まれます。これらの指標間の関係を理解する1つの方法は、可変の動作条件（誤分類コストおよび/またはクラス分布の形式）を使用することです。したがって、指標は、さまざまな動作条件での予想される損失に対応する場合があります。分析の1つの次元は、この動作条件の範囲の分布であり、適切なスコアリングルールの領域でいくつかの重要なつながりにつながっています。この論文では、これまでパフォーマンス指標の分析であまり注目されていなかった、同様に重要な次元があることを示します。この次元は、スコアリングモデルを使用するときにしきい値選択方法として通常実装される決定ルールによって与えられます。この論文では、固定、スコア均一、スコア駆動、レート駆動、最適など、多くの古いしきい値選択方法と新しいしきい値選択方法を検討します。これらのしきい値選択方法を使用して、均一な動作条件の範囲で得られる期待損失を計算することにより、0-1損失、絶対誤差、ブライアースコア、AUC、および精製損失をそれぞれ明確に解釈できます。私たちの分析は、パフォーマンスメトリックの包括的なビューと、損失最小化への体系的なアプローチを提供します。これは次のように要約できます。モデルが与えられた場合、動作条件に関する利用可能な情報に対応するしきい値選択方法を適用し、それらの期待損失を比較します。この手順を支援するために、前述のパフォーマンスメトリック間のいくつかの関係も導き出し、しきい値選択方法を選択する際のキャリブレーションの役割を強調します。

Multi-task Regression using Minimal Penalties
最小ペナルティを使用したマルチタスク回帰

In this paper we study the kernel multiple ridge regression framework, which we refer to as multi-task regression, using penalization techniques. The theoretical analysis of this problem shows that the key element appearing for an optimal calibration is the covariance matrix of the noise between the different tasks. We present a new algorithm to estimate this covariance matrix, based on the concept of minimal penalty, which was previously used in the single-task regression framework to estimate the variance of the noise. We show, in a non-asymptotic setting and under mild assumptions on the target function, that this estimator converges towards the covariance matrix. Then plugging this estimator into the corresponding ideal penalty leads to an oracle inequality. We illustrate the behavior of our algorithm on synthetic examples.

この論文では、ペナルティ化手法を使用して、カーネルのマルチリッジ回帰フレームワーク(マルチタスク回帰と呼ばれる)について研究します。この問題の理論的分析は、最適なキャリブレーションのために現れる重要な要素が、異なるタスク間のノイズの共分散行列であることを示しています。この共分散行列を推定する新しいアルゴリズムを、以前は単一タスク回帰フレームワークでノイズの分散を推定するために使用されていた最小ペナルティの概念に基づいて提示します。非漸近的な設定で、ターゲット関数の穏やかな仮定の下で、この推定量が共分散行列に向かって収束することを示します。次に、この推定量を対応する理想的なペナルティに当てはめると、オラクルの不等式が発生します。このアルゴリズムの動作を合成例で説明します。

Linear Regression With Random Projections
ランダム射影による線形回帰

We investigate a method for regression that makes use of a randomly generated subspace GP⊂F (of finite dimension P) of a given large (possibly infinite) dimensional function space F, for example, L2([0,1]d;ℜ). GP is defined as the span of P random features that are linear combinations of a basis functions of F weighted by random Gaussian i.i.d. coefficients. We show practical motivation for the use of this approach, detail the link that this random projections method share with RKHS and Gaussian objects theory and prove, both in deterministic and random design, approximation error bounds when searching for the best regression function in GP rather than in F, and derive excess risk bounds for a specific regression algorithm (least squares regression in GP). This paper stresses the motivation to study such methods, thus the analysis developed is kept simple for explanations purpose and leaves room for future developments.

私たちは、与えられた大きな(おそらく無限の)関数空間F、例えばL2([0,1]d;R)のランダムに生成された部分空間GP⊂F(有限次元P)を利用する回帰の方法を調査します。GPは、Fの基底関数をランダムなガウスi.i.d.係数で重み付けした線形結合であるP個のランダム特徴の範囲として定義されます。このアプローチを使用する実際的な動機を示し、このランダム射影法がRKHSおよびガウスオブジェクト理論と共有するリンクを詳しく説明し、決定論的およびランダムデザインの両方で、FではなくGPで最適な回帰関数を検索する際の近似誤差範囲を証明し、特定の回帰アルゴリズム(GPの最小二乗回帰)の過剰なリスク範囲を導き出します。この論文では、そのような方法を研究する動機を強調しているため、開発された分析は説明目的でシンプルに保たれ、将来の開発の余地を残しています。

Coherence Functions with Applications in Large-Margin Classification Methods
大マージン分類法における応用によるコヒーレンス関数

Support vector machines (SVMs) naturally embody sparseness due to their use of hinge loss functions. However, SVMs can not directly estimate conditional class probabilities. In this paper we propose and study a family of coherence functions, which are convex and differentiable, as surrogates of the hinge function. The coherence function is derived by using the maximum-entropy principle and is characterized by a temperature parameter. It bridges the hinge function and the logit function in logistic regression. The limit of the coherence function at zero temperature corresponds to the hinge function, and the limit of the minimizer of its expected error is the minimizer of the expected error of the hinge loss. We refer to the use of the coherence function in large-margin classification as “C-learning,” and we present efficient coordinate descent algorithms for the training of regularized C-learning models.

サポートベクターマシン(SVM)は、ヒンジ損失関数を使用しているため、自然にスパース性を具現化します。ただし、SVMは条件付きクラス確率を直接推定することはできません。この論文では、ヒンジ関数の代理として、凸型で微分可能なコヒーレンス関数のファミリーを提案し、研究します。コヒーレンス関数は、最大エントロピーの原理を使用して導出され、温度パラメーターによって特徴付けられます。これは、ロジスティック回帰のヒンジ機能とロジット関数を橋渡しします。ゼロ温度でのコヒーレンス関数の限界はヒンジ関数に対応し、その期待誤差の最小化器の限界は、ヒンジ損失の期待誤差の最小化器です。大マージン分類におけるコヒーレンス関数の使用を「C学習」と呼び、正則化されたC学習モデルの学習のための効率的な座標降下アルゴリズムを提示します。

PREA: Personalized Recommendation Algorithms Toolkit
PREA: パーソナライズされたレコメンデーションアルゴリズムツールキット

Recommendation systems are important business applications with significant economic impact. In recent years, a large number of algorithms have been proposed for recommendation systems. In this paper, we describe an open-source toolkit implementing many recommendation algorithms as well as popular evaluation metrics. In contrast to other packages, our toolkit implements recent state-of-the-art algorithms as well as most classic algorithms.

レコメンデーションシステムは、大きな経済的影響を持つ重要なビジネスアプリケーションです。近年、レコメンデーションシステムとして多数のアルゴリズムが提案されています。この論文では、多くの推奨アルゴリズムと一般的な評価指標を実装するオープンソースのツールキットについて説明します。他のパッケージとは対照的に、私たちのツールキットは、最新の最先端のアルゴリズムとほとんどの古典的なアルゴリズムを実装しています。

Selective Sampling and Active Learning from Single and Multiple Teachers
単一および複数の教師による選択的サンプリングとアクティブラーニング

We present a new online learning algorithm in the selective sampling framework, where labels must be actively queried before they are revealed. We prove bounds on the regret of our algorithm and on the number of labels it queries when faced with an adaptive adversarial strategy of generating the instances. Our bounds both generalize and strictly improve over previous bounds in similar settings. Additionally, our selective sampling algorithm can be converted into an efficient statistical active learning algorithm. We extend our algorithm and analysis to the multiple-teacher setting, where the algorithm can choose which subset of teachers to query for each label. Finally, we demonstrate the effectiveness of our techniques on a real-world Internet search problem.

私たちは、選択的サンプリングフレームワークの新しいオンライン学習アルゴリズムを紹介します。ラベルは、ラベルが明らかになる前に積極的にクエリする必要があります。私たちは、アルゴリズムの後悔と、インスタンスを生成する適応的な敵対的戦略に直面したときにクエリするラベルの数に限界があることを証明します。私たちの境界は、同様の設定で以前の境界よりも一般化され、厳密に改善されます。さらに、当社の選択的サンプリングアルゴリズムは、効率的な統計的アクティブラーニングアルゴリズムに変換することができます。アルゴリズムと分析を複数教師の設定に拡張し、アルゴリズムが各ラベルに対してクエリを実行する教師のサブセットを選択できるようにします。最後に、実際のインターネット検索問題に対する私たちの手法の有効性を示します。

Static Prediction Games for Adversarial Learning Problems
敵対的学習問題に対する静的予測ゲーム

The standard assumption of identically distributed training and test data is violated when the test data are generated in response to the presence of a predictive model. This becomes apparent, for example, in the context of email spam filtering. Here, email service providers employ spam filters, and spam senders engineer campaign templates to achieve a high rate of successful deliveries despite the filters. We model the interaction between the learner and the data generator as a static game in which the cost functions of the learner and the data generator are not necessarily antagonistic. We identify conditions under which this prediction game has a unique Nash equilibrium and derive algorithms that find the equilibrial prediction model. We derive two instances, the Nash logistic regression and the Nash support vector machine, and empirically explore their properties in a case study on email spam filtering.

トレーニングデータとテストデータが同一に分散されるという標準的な仮定は、予測モデルの存在に応答してテストデータが生成される場合、違反します。これは、たとえば、電子メールスパムフィルタリングのコンテキストで明らかになります。ここでは、電子メールサービスプロバイダーはスパムフィルターを採用し、スパム送信者はキャンペーンテンプレートを設計して、フィルターにもかかわらず高い成功率を達成します。学習器とデータ生成器との間の相互作用は、学習器とデータ生成器のコスト関数が必ずしも敵対的ではない静的なゲームとしてモデル化します。この予測ゲームが独自のナッシュ均衡を持つ条件を特定し、均衡予測モデルを見つけるアルゴリズムを導き出します。Nashロジスティック回帰とNashサポートベクターマシンの2つのインスタンスを導き出し、電子メールスパムフィルタリングのケーススタディでそれらの特性を経験的に調査します。

Finding Recurrent Patterns from Continuous Sign Language Sentences for Automated Extraction of Signs
手話の自動抽出のための連続手話言語文からの反復パターンの発見

We present a probabilistic framework to automatically learn models of recurring signs from multiple sign language video sequences containing the vocabulary of interest. We extract the parts of the signs that are present in most occurrences of the sign in context and are robust to the variations produced by adjacent signs. Each sentence video is first transformed into a multidimensional time series representation, capturing the motion and shape aspects of the sign. Skin color blobs are extracted from frames of color video sequences, and a probabilistic relational distribution is formed for each frame using the contour and edge pixels from the skin blobs. Each sentence is represented as a trajectory in a low dimensional space called the space of relational distributions. Given these time series trajectories, we extract signemes from multiple sentences concurrently using iterated conditional modes (ICM). We show results by learning single signs from a collection of sentences with one common pervading sign, multiple signs from a collection of sentences with more than one common sign, and single signs from a mixed collection of sentences. The extracted signemes demonstrate that our approach is robust to some extent to the variations produced within a sign due to different contexts. We also show results whereby these learned sign models are used for spotting signs in test sequences.

私たちは、関心語彙を含む複数の手話ビデオシーケンスから、繰り返し出現する手話のモデルを自動的に学習する確率的フレームワークを提示します。文脈内で手話が最も多く出現し、隣接する手話によって生じる変化に対してロバストな手話の部分を抽出した。各文のビデオは、まず多次元時系列表現に変換され、手話の動きと形状の側面を捉える。肌の色の塊はカラービデオシーケンスのフレームから抽出され、肌の色の塊の輪郭とエッジピクセルを使用して、各フレームの確率的関係分布が形成されます。各文は、関係分布空間と呼ばれる低次元空間内の軌跡として表されます。これらの時系列軌跡が与えられたら、反復条件モード(ICM)を使用して複数の文から同時に手話素を抽出した。私たちは、1つの共通する手話を持つ文のコレクションから単一の手話、複数の共通する手話を持つ文のコレクションから複数の手話、および混合文のコレクションから単一の手話を学習することで結果を示した。抽出された記号は、私たちのアプローチが、異なるコンテキストによって記号内に生じる変化に対してある程度堅牢であることを示しています。また、学習したこれらの記号モデルをテストシーケンス内の記号の検出に使用する結果も示します。

Nonparametric Guidance of Autoencoder Representations using Label Information
ラベル情報を用いた自己符号化器表現のノンパラメトリックガイダンス

While unsupervised learning has long been useful for density modeling, exploratory data analysis and visualization, it has become increasingly important for discovering features that will later be used for discriminative tasks. Discriminative algorithms often work best with highly-informative features; remarkably, such features can often be learned without the labels. One particularly effective way to perform such unsupervised learning has been to use autoencoder neural networks, which find latent representations that are constrained but nevertheless informative for reconstruction. However, pure unsupervised learning with autoencoders can find representations that may or may not be useful for the ultimate discriminative task. It is a continuing challenge to guide the training of an autoencoder so that it finds features which will be useful for predicting labels. Similarly, we often have a priori information regarding what statistical variation will be irrelevant to the ultimate discriminative task, and we would like to be able to use this for guidance as well. Although a typical strategy would be to include a parametric discriminative model as part of the autoencoder training, here we propose a nonparametric approach that uses a Gaussian process to guide the representation. By using a nonparametric model, we can ensure that a useful discriminative function exists for a given set of features, without explicitly instantiating it. We demonstrate the superiority of this guidance mechanism on four data sets, including a real-world application to rehabilitation research. We also show how our proposed approach can learn to explicitly ignore statistically significant covariate information that is label-irrelevant, by evaluating on the small NORB image recognition problem in which pose and lighting labels are available.

教師なし学習は、密度モデリング、探索的データ分析、視覚化に長い間役立ってきましたが、後に識別タスクに使用される特徴を発見する上でますます重要になっています。識別アルゴリズムは、多くの場合、非常に有益な特徴で最も効果的に機能します。驚くべきことに、そのような特徴はラベルなしで学習できることがよくあります。そのような教師なし学習を実行するための特に効果的な方法の1つは、制約されているものの再構築に役立つ潜在的な表現を見つけるオートエンコーダーニューラルネットワークを使用することです。ただし、オートエンコーダーを使用した純粋な教師なし学習では、最終的な識別タスクに役立つかどうかわからない表現を見つけることができます。ラベルの予測に役立つ特徴を見つけるようにオートエンコーダーのトレーニングをガイドすることは、継続的な課題です。同様に、最終的な識別タスクに無関係な統計的変動に関する事前情報を持っていることが多く、これをガイドとしても使用できるようにしたいと考えています。典型的な戦略は、オートエンコーダのトレーニングの一部としてパラメトリック識別モデルを含めることですが、ここでは、ガウス過程を使用して表現をガイドするノンパラメトリックアプローチを提案します。ノンパラメトリックモデルを使用することで、明示的にインスタンス化することなく、特定の機能セットに対して有用な識別関数が存在することを確認できます。リハビリテーション研究への実際のアプリケーションを含む4つのデータセットで、このガイダンスメカニズムの優位性を実証します。また、ポーズと照明のラベルが利用可能な小さなNORB画像認識問題で評価することにより、提案されたアプローチがラベルに関連しない統計的に有意な共変量情報を明示的に無視することを学習する方法を示します。

Robust Kernel Density Estimation
ロバストなカーネル密度推定

We propose a method for nonparametric density estimation that exhibits robustness to contamination of the training sample. This method achieves robustness by combining a traditional kernel density estimator (KDE) with ideas from classical M-estimation. We interpret the KDE based on a positive semi-definite kernel as a sample mean in the associated reproducing kernel Hilbert space. Since the sample mean is sensitive to outliers, we estimate it robustly via M-estimation, yielding a robust kernel density estimator (RKDE). An RKDE can be computed efficiently via a kernelized iteratively re-weighted least squares (IRWLS) algorithm. Necessary and sufficient conditions are given for kernelized IRWLS to converge to the global minimizer of the M-estimator objective function. The robustness of the RKDE is demonstrated with a representer theorem, the influence function, and experimental results for density estimation and anomaly detection.

私たちは、トレーニングサンプルの汚染に対してロバスト性を示すノンパラメトリック密度推定の方法を提案します。この手法は、従来のカーネル密度推定量(KDE)と従来のM推定のアイデアを組み合わせることで、ロバスト性を実現します。正の半定値カーネルに基づいてKDEを、関連する再現カーネルHilbert空間のサンプル平均として解釈します。サンプル平均は外れ値の影響を受けやすいため、M推定によってロバストに推定し、ロバストなカーネル密度推定量(RKDE)を生成します。RKDEは、カーネル化された反復再重み付け最小二乗法(IRWLS)アルゴリズムを使用して効率的に計算できます。カーネル化されたIRWLSがM-estimator目的関数の大域最小化器に収束するために必要十分な条件が与えられます。RCDEのロバスト性は、表現定理、影響関数、および密度推定と異常検出の実験結果によって実証されています。

Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints
効率性に対する取引の後悔:長期制約を伴うオンライン凸最適化

In this paper we propose efficient algorithms for solving constrained online convex optimization problems. Our motivation stems from the observation that most algorithms proposed for online convex optimization require a projection onto the convex set K from which the decisions are made. While the projection is straightforward for simple shapes (e.g., Euclidean ball), for arbitrary complex sets it is the main computational challenge and may be inefficient in practice. In this paper, we consider an alternative online convex optimization problem. Instead of requiring that decisions belong to K for all rounds, we only require that the constraints, which define the set K, be satisfied in the long run. By turning the problem into an online convex-concave optimization problem, we propose an efficient algorithm which achieves O(√T) regret bound and O(T3/4) bound on the violation of constraints. Then, we modify the algorithm in order to guarantee that the constraints are satisfied in the long run. This gain is achieved at the price of getting O(T3/4) regret bound. Our second algorithm is based on the mirror prox method (Nemirovski, 2005) to solve variational inequalities which achieves O(T2/3) bound for both regret and the violation of constraints when the domain K can be described by a finite number of linear constraints. Finally, we extend the results to the setting where we only have partial access to the convex set K and propose a multipoint bandit feedback algorithm with the same bounds in expectation as our first algorithm.

この論文では、制約付きオンライン凸最適化問題を解くための効率的なアルゴリズムを提案します。この動機は、オンライン凸最適化用に提案されているほとんどのアルゴリズムが、決定が行われる凸集合Kへの射影を必要とするという観察から生じています。射影は単純な形状(ユークリッド球など)では簡単ですが、任意の複雑な集合では主な計算上の課題となり、実際には非効率的である可能性があります。この論文では、別のオンライン凸最適化問題を検討します。すべてのラウンドで決定がKに属することを要求するのではなく、集合Kを定義する制約が長期的に満たされることのみを要求します。問題をオンライン凸凹最適化問題に変換することで、O(√T)の後悔境界と制約違反のO(T3/4)境界を実現する効率的なアルゴリズムを提案します。次に、長期的に制約が満たされることを保証するためにアルゴリズムを変更します。この利点は、O(T3/4)の後悔境界を得るという代償で達成されます。2番目のアルゴリズムは、ミラー近似法(Nemirovski、2005)に基づいて変分不等式を解き、ドメインKが有限個の線形制約によって記述できる場合に、後悔と制約違反の両方に対してO(T2/3)の境界を実現します。最後に、凸集合Kに部分的にしかアクセスできない設定に結果を拡張し、最初のアルゴリズムと同じ期待値の境界を持つマルチポイントバンディットフィードバックアルゴリズムを提案します。

On the Convergence Rate of lp-Norm Multiple Kernel Learning
lpノルム多重カーネル学習の収束率について

We derive an upper bound on the local Rademacher complexity of lp-norm multiple kernel learning, which yields a tighter excess risk bound than global approaches. Previous local approaches analyzed the case p=1 only while our analysis covers all cases 1≤p≤∞, assuming the different feature mappings corresponding to the different kernels to be uncorrelated. We also show a lower bound that shows that the bound is tight, and derive consequences regarding excess loss, namely fast convergence rates of the order O(n-α/1+α), where α is the minimum eigenvalue decay rate of the individual kernels.

私たちは、lp-norm多重カーネル学習のローカルなRademacher複雑性に上限を導き出し、グローバルアプローチよりも厳しい過剰リスクバウンドをもたらします。以前のローカルアプローチでは、ケースp=1のみを分析しましたが、分析はすべてのケース1≤p≤∞を対象としており、異なるカーネルに対応するさまざまな特徴マッピングが相関していないと仮定しています。また、バウンドがタイトであることを示す下限を示し、過剰な損失に関する結果、つまり、個々のカーネルの最小固有値減衰率であるO(n-α/1+α α)の速い収束率を導き出します。

Characterization and Greedy Learning of Interventional Markov Equivalence Classes of Directed Acyclic Graphs
有向非巡回グラフの介入的マルコフ同等性クラスの特性化と貪欲学習

The investigation of directed acyclic graphs (DAGs) encoding the same Markov property, that is the same conditional independence relations of multivariate observational distributions, has a long tradition; many algorithms exist for model selection and structure learning in Markov equivalence classes. In this paper, we extend the notion of Markov equivalence of DAGs to the case of interventional distributions arising from multiple intervention experiments. We show that under reasonable assumptions on the intervention experiments, interventional Markov equivalence defines a finer partitioning of DAGs than observational Markov equivalence and hence improves the identifiability of causal models. We give a graph theoretic criterion for two DAGs being Markov equivalent under interventions and show that each interventional Markov equivalence class can, analogously to the observational case, be uniquely represented by a chain graph called interventional essential graph (also known as CPDAG in the observational case). These are key insights for deriving a generalization of the Greedy Equivalence Search algorithm aimed at structure learning from interventional data. This new algorithm is evaluated in a simulation study.

多変量観測分布の同じ条件付き独立関係である同じマルコフ特性をエンコードする有向非巡回グラフ(DAG)の研究には長い伝統があり、マルコフ同値クラスでのモデル選択と構造学習のためのアルゴリズムが多数存在します。この論文では、DAGのマルコフ同値の概念を、複数の介入実験から生じる介入分布の場合に拡張します。介入実験に関する合理的な仮定の下で、介入マルコフ同値では観測マルコフ同値よりも細かいDAGの分割が定義され、したがって因果モデルの識別可能性が向上することを示します。介入下で2つのDAGがマルコフ同値であるためのグラフ理論的基準を示し、各介入マルコフ同値クラスは、観測の場合と同様に、介入必須グラフ(観測の場合はCPDAGとも呼ばれる)と呼ばれるチェーングラフによって一意に表されることを示します。これらは、介入データから構造を学習することを目的とした貪欲同値検索アルゴリズムの一般化を導き出すための重要な洞察です。この新しいアルゴリズムは、シミュレーション研究で評価されています。

Multi-Target Regression with Rule Ensembles
ルールアンサンブルによるマルチターゲット回帰

Methods for learning decision rules are being successfully applied to many problem domains, in particular when understanding and interpretation of the learned model is necessary. In many real life problems, we would like to predict multiple related (nominal or numeric) target attributes simultaneously. While several methods for learning rules that predict multiple targets at once exist, they are all based on the covering algorithm, which does not work well for regression problems. A better solution for regression is the rule ensemble approach that transcribes an ensemble of decision trees into a large collection of rules. An optimization procedure is then used to select the best (and much smaller) subset of these rules and to determine their respective weights. We introduce the FIRE algorithm for solving multi-target regression problems, which employs the rule ensembles approach. We improve the accuracy of the algorithm by adding simple linear functions to the ensemble. We also extensively evaluate the algorithm with and without linear functions. The results show that the accuracy of multi-target regression rule ensembles is high. They are more accurate than, for instance, multi-target regression trees, but not quite as accurate as multi-target random forests. The rule ensembles are significantly more concise than random forests, and it is also possible to create compact rule sets that are smaller than a single regression tree but still comparable in accuracy.

決定ルールを学習する方法は、特に学習したモデルの理解と解釈が必要な場合に、多くの問題領域にうまく適用されています。多くの実際の問題では、複数の関連する(名目または数値)ターゲット属性を同時に予測する必要があります。複数のターゲットを一度に予測するルールを学習する方法はいくつかありますが、それらはすべてカバーアルゴリズムに基づいており、回帰問題には適していません。回帰のより優れたソリューションは、決定木のアンサンブルをルールの大規模なコレクションに転記するルールアンサンブルアプローチです。次に、最適化手順を使用して、これらのルールの最適な(はるかに小さい)サブセットを選択し、それぞれの重みを決定します。ルールアンサンブルアプローチを採用した、マルチターゲット回帰問題を解決するためのFIREアルゴリズムを紹介します。アンサンブルに単純な線形関数を追加することで、アルゴリズムの精度を向上させます。また、線形関数の有無でアルゴリズムを徹底的に評価します。結果は、マルチターゲット回帰ルールアンサンブルの精度が高いことを示しています。これらは、たとえばマルチターゲット回帰ツリーよりも正確ですが、マルチターゲットランダムフォレストほど正確ではありません。ルールアンサンブルはランダムフォレストよりもはるかに簡潔であり、単一の回帰ツリーよりも小さくても、精度が同等のコンパクトなルールセットを作成することもできます。

A Local Spectral Method for Graphs: With Applications to Improving Graph Partitions and Exploring Data Graphs Locally
グラフの局所スペクトル法:グラフ分割の改善とデータグラフのローカル探索への応用

The second eigenvalue of the Laplacian matrix and its associated eigenvector are fundamental features of an undirected graph, and as such they have found widespread use in scientific computing, machine learning, and data analysis. In many applications, however, graphs that arise have several local regions of interest, and the second eigenvector will typically fail to provide information fine-tuned to each local region. In this paper, we introduce a locally-biased analogue of the second eigenvector, and we demonstrate its usefulness at highlighting local properties of data graphs in a semi-supervised manner. To do so, we first view the second eigenvector as the solution to a constrained optimization problem, and we incorporate the local information as an additional constraint; we then characterize the optimal solution to this new problem and show that it can be interpreted as a generalization of a Personalized PageRank vector; and finally, as a consequence, we show that the solution can be computed in nearly-linear time. In addition, we show that this locally-biased vector can be used to compute an approximation to the best partition near an input seed set in a manner analogous to the way in which the second eigenvector of the Laplacian can be used to obtain an approximation to the best partition in the entire input graph. Such a primitive is useful for identifying and refining clusters locally, as it allows us to focus on a local region of interest in a semi-supervised manner. Finally, we provide a detailed empirical evaluation of our method by showing how it can applied to finding locally-biased sparse cuts around an input vertex seed set in social and information networks.

ラプラシアン行列の2番目の固有値とそれに関連する固有ベクトルは、無向グラフの基本的な特徴であり、科学計算、機械学習、データ分析で広く使用されています。ただし、多くのアプリケーションでは、グラフには複数のローカル関心領域があり、2番目の固有ベクトルは通常、各ローカル領域に微調整された情報を提供できません。この論文では、2番目の固有ベクトルのローカルバイアスアナログを紹介し、半教師あり方式でデータグラフのローカルプロパティを強調する有用性を示します。そのために、まず2番目の固有ベクトルを制約付き最適化問題のソリューションと見なし、ローカル情報を追加の制約として組み込みます。次に、この新しい問題に対する最適ソリューションを特徴付け、それがPersonalized PageRankベクトルの一般化として解釈できることを示します。最後に、結果として、ソリューションをほぼ線形時間で計算できることを示します。さらに、この局所的に偏ったベクトルを使用して、ラプラシアンの2番目の固有ベクトルを使用して入力グラフ全体の最適なパーティションの近似値を取得するのと類似した方法で、入力シードセット付近の最適なパーティションの近似値を計算できることを示します。このようなプリミティブは、半教師あり方式で関心のあるローカル領域に焦点を当てることができるため、クラスターをローカルに識別して改良するのに役立ちます。最後に、ソーシャルネットワークと情報ネットワークの入力頂点シードセットの周囲で局所的に偏ったスパースカットを見つけるためにこの方法を適用できる方法を示して、この方法の詳細な実証的評価を提供します。

High-Dimensional Gaussian Graphical Model Selection: Walk Summability and Local Separation Criterion
高次元ガウスグラフモデル選択:歩行合計可能性と局所分離基準

We consider the problem of high-dimensional Gaussian graphical model selection. We identify a set of graphs for which an efficient estimation algorithm exists, and this algorithm is based on thresholding of empirical conditional covariances. Under a set of transparent conditions, we establish structural consistency (or sparsistency) for the proposed algorithm, when the number of samples n=Ω(Jmin-2 log p), where p is the number of variables and Jmin is the minimum (absolute) edge potential of the graphical model. The sufficient conditions for sparsistency are based on the notion of walk-summability of the model and the presence of sparse local vertex separators in the underlying graph. We also derive novel non-asymptotic necessary conditions on the number of samples required for sparsistency.

私たちは、高次元のガウスグラフィカルモデル選択の問題について考えます。効率的な推定アルゴリズムが存在するグラフのセットを特定し、このアルゴリズムは経験的条件付き共分散のしきい値化に基づいています。一連の透明な条件下で、サンプル数n=Ω(Jmin-2 log p)の場合、提案されたアルゴリズムの構造的一貫性(またはスパース性)を確立します。ここで、pは変数の数、Jminはグラフィカルモデルの最小(絶対)エッジポテンシャルです。スパース性の十分条件は、モデルのウォーク合計可能性の概念と、基になるグラフにスパースなローカル頂点セパレーターが存在することに基づいています。また、スパーシステンシーに必要なサンプル数について、新しい非漸近的な必要条件を導き出します。

Pairwise Support Vector Machines and their Application to Large Scale Problems
ペアワイズサポートベクターマシンとその大規模問題への応用

Pairwise classification is the task to predict whether the examples a,b of a pair (a,b) belong to the same class or to different classes. In particular, interclass generalization problems can be treated in this way. In pairwise classification, the order of the two input examples should not affect the classification result. To achieve this, particular kernels as well as the use of symmetric training sets in the framework of support vector machines were suggested. The paper discusses both approaches in a general way and establishes a strong connection between them. In addition, an efficient implementation is discussed which allows the training of several millions of pairs. The value of these contributions is confirmed by excellent results on the labeled faces in the wild benchmark.

ペアワイズ分類は、ペアの例a,b(a,b)が同じクラスに属するか異なるクラスに属するかを予測するタスクです。特に、クラス間一般化の問題はこのように扱うことができます。ペアワイズ分類では、2つの入力例の順序が分類結果に影響を与えないようにする必要があります。これを達成するために、特定のカーネルと、サポートベクターマシンのフレームワークでの対称トレーニングセットの使用が提案されました。この論文では、両方のアプローチを一般的な方法で説明し、それらの間に強いつながりを確立します。さらに、数百万のペアのトレーニングを可能にする効率的な実装についても説明します。これらの貢献の価値は、野生のベンチマークでラベル付けされた面の優れた結果によって確認されます。

MedLDA: Maximum Margin Supervised Topic Models
MedLDA: 最大マージン教師ありトピックモデル

A supervised topic model can use side information such as ratings or labels associated with documents or images to discover more predictive low dimensional topical representations of the data. However, existing supervised topic models predominantly employ likelihood-driven objective functions for learning and inference, leaving the popular and potentially powerful max-margin principle unexploited for seeking predictive representations of data and more discriminative topic bases for the corpus. In this paper, we propose the maximum entropy discrimination latent Dirichlet allocation (MedLDA) model, which integrates the mechanism behind the max-margin prediction models (e.g., SVMs) with the mechanism behind the hierarchical Bayesian topic models (e.g., LDA) under a unified constrained optimization framework, and yields latent topical representations that are more discriminative and more suitable for prediction tasks such as document classification or regression. The principle underlying the MedLDA formalism is quite general and can be applied for jointly max-margin and maximum likelihood learning of directed or undirected topic models when supervising side information is available. Efficient variational methods for posterior inference and parameter estimation are derived and extensive empirical studies on several real data sets are also provided. Our experimental results demonstrate qualitatively and quantitatively that MedLDA could: 1) discover sparse and highly discriminative topical representations; 2) achieve state of the art prediction performance; and 3) be more efficient than existing supervised topic models, especially for classification.

教師ありトピックモデルは、ドキュメントや画像に関連付けられた評価やラベルなどの副次情報を使用して、データのより予測的な低次元トピック表現を発見することができます。しかし、既存の教師ありトピックモデルは、学習と推論に尤度駆動型の目的関数を主に使用しているため、データの予測表現やコーパスのより識別的なトピックベースの探索に、人気があり潜在的に強力な最大マージン原理が活用されていません。この論文では、最大エントロピー識別潜在ディリクレ配分(MedLDA)モデルを提案します。このモデルは、統一された制約付き最適化フレームワークの下で、最大マージン予測モデル(SVMなど)の背後にあるメカニズムと階層ベイズトピックモデル(LDAなど)の背後にあるメカニズムを統合し、ドキュメント分類や回帰などの予測タスクに対してより識別的でより適した潜在トピック表現を生成します。MedLDA形式の基礎となる原理は非常に一般的であり、監督サイド情報が利用可能な場合、有向または無向トピックモデルの最大マージンと最大尤度学習の併用に適用できます。事後推論とパラメータ推定のための効率的な変分法が導出され、いくつかの実際のデータセットに関する広範な実証研究も提供されています。私たちの実験結果は、MedLDAが次のことを定性的および定量的に実証しています。1)まばらで非常に識別力の高いトピック表現を発見します。2)最先端の予測パフォーマンスを実現します。3)特に分類において、既存の監督トピックモデルよりも効率的です。

A Topic Modeling Toolbox Using Belief Propagation
確率伝搬法を使用したトピックモデリングツールボックス

Latent Dirichlet allocation (LDA) is an important hierarchical Bayesian model for probabilistic topic modeling, which attracts worldwide interests and touches on many important applications in text mining, computer vision and computational biology. This paper introduces a topic modeling toolbox (TMBP) based on the belief propagation (BP) algorithms. TMBP toolbox is implemented by MEX C++/Matlab/Octave for either Windows 7 or Linux. Compared with existing topic modeling packages, the novelty of this toolbox lies in the BP algorithms for learning LDA-based topic models. The current version includes BP algorithms for latent Dirichlet allocation (LDA), author-topic models (ATM), relational topic models (RTM), and labeled LDA (LaLDA). This toolbox is an ongoing project and more BP-based algorithms for various topic models will be added in the near future. Interested users may also extend BP algorithms for learning more complicated topic models. The source codes are freely available under the GNU General Public Licence, Version 1.0 at https://mloss.org/software/view/399/.

潜在ディリクレ配分法(LDA)は、確率的トピックモデリングのための重要な階層的ベイズモデルであり、世界中の関心を集め、テキストマイニング、コンピュータービジョン、計算生物学における多くの重要なアプリケーションに関係しています。この論文では、ビリーフプロパゲーション(BP)アルゴリズムに基づくトピックモデリングツールボックス(TMBP)を紹介します。TMBPツールボックスは、Windows 7またはLinux用のMEX C++/Matlab/Octaveによって実装されています。既存のトピックモデリングパッケージと比較した場合、このツールボックスの新規性は、LDAベースのトピックモデルを学習するためのBPアルゴリズムにあります。現在のバージョンには、潜在ディリクレ配分法(LDA)、著者トピックモデル(ATM)、リレーショナルトピックモデル(RTM)、およびラベル付きLDA (LaLDA)用のBPアルゴリズムが含まれています。このツールボックスは進行中のプロジェクトであり、近い将来、さまざまなトピックモデル用のBPベースのアルゴリズムがさらに追加される予定です。関心のあるユーザーは、より複雑なトピックモデルを学習するためにBPアルゴリズムを拡張することもできます。ソースコードは、GNU General Public Licenseバージョン1.0に基づいてhttps://mloss.org/software/view/399/から無料で入手できます。

Sign Language Recognition using Sub-Units
サブユニットを使用した手話認識

This paper discusses sign language recognition using linguistic sub-units. It presents three types of sub-units for consideration; those learnt from appearance data as well as those inferred from both 2D or 3D tracking data. These sub-units are then combined using a sign level classifier; here, two options are presented. The first uses Markov Models to encode the temporal changes between sub-units. The second makes use of Sequential Pattern Boosting to apply discriminative feature selection at the same time as encoding temporal information. This approach is more robust to noise and performs well in signer independent tests, improving results from the 54% achieved by the Markov Chains to 76%.

この論文では、言語サブユニットを使用した手話認識について説明します。検討のために3種類のサブユニットを提示します。外観データから学習したものと、2Dまたは3Dトラッキングデータの両方から推測されたもの。次に、これらのサブユニットは、符号レベル分類器を使用して結合されます。ここでは、2つのオプションを示します。1つ目は、マルコフモデルを使用して、サブユニット間の時間的変化をエンコードします。2つ目は、シーケンシャルパターンブースティングを使用して、時間情報のエンコードと同時に識別的な特徴選択を適用します。このアプローチはノイズに対してより堅牢で、署名者に依存しないテストで優れたパフォーマンスを発揮し、マルコフ連鎖で達成された54%から76%に結果が向上しました。

An Introduction to Artificial Prediction Markets for Classification
分類のための人工予測市場の紹介

Prediction markets are used in real life to predict outcomes of interest such as presidential elections. This paper presents a mathematical theory of artificial prediction markets for supervised learning of conditional probability estimators. The artificial prediction market is a novel method for fusing the prediction information of features or trained classifiers, where the fusion result is the contract price on the possible outcomes. The market can be trained online by updating the participants’ budgets using training examples. Inspired by the real prediction markets, the equations that govern the market are derived from simple and reasonable assumptions. Efficient numerical algorithms are presented for solving these equations. The obtained artificial prediction market is shown to be a maximum likelihood estimator. It generalizes linear aggregation, existent in boosting and random forest, as well as logistic regression and some kernel methods. Furthermore, the market mechanism allows the aggregation of specialized classifiers that participate only on specific instances. Experimental comparisons show that the artificial prediction markets often outperform random forest and implicit online learning on synthetic data and real UCI data sets. Moreover, an extensive evaluation for pelvic and abdominal lymph node detection in CT data shows that the prediction market improves adaboost’s detection rate from 79.6% to 81.2% at 3 false positives/volume.

予測市場は、大統領選挙などの関心のある結果を予測するために実生活で使用されています。この論文では、条件付き確率推定量の教師あり学習のための人工予測市場の数学的理論を紹介します。人工予測市場は、特徴または訓練された分類器の予測情報を融合する新しい方法であり、融合結果は可能な結果の契約価格です。市場は、トレーニング例を使用して参加者の予算を更新することでオンラインでトレーニングできます。実際の予測市場に触発され、市場を支配する方程式は単純で合理的な仮定から導き出されます。これらの方程式を解くための効率的な数値アルゴリズムが提示されています。得られた人工予測市場は、最大尤度推定量であることが示されています。これは、ブースティングとランダムフォレスト、ロジスティック回帰、およびいくつかのカーネル法に存在する線形集約を一般化します。さらに、市場メカニズムにより、特定のインスタンスにのみ参加する特殊な分類器の集約が可能になります。実験的な比較により、人工予測市場は、合成データと実際のUCIデータセットでランダムフォレストや暗黙のオンライン学習よりも優れていることが示されています。さらに、CTデータでの骨盤および腹部のリンパ節検出に関する広範な評価では、予測市場によって、ボリュームあたり3つの誤検出でadaboostの検出率が79.6%から81.2%に向上することが示されています。

DEAP: Evolutionary Algorithms Made Easy
DEAP:進化的アルゴリズムが簡単に

DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas. Its design departs from most other existing frameworks in that it seeks to make algorithms explicit and data structures transparent, as opposed to the more common black-box frameworks. Freely available with extensive documentation at http://deap.gel.ulaval.ca, DEAP is an open source project under an LGPL license.

DEAPは、アイデアのラピッドプロトタイピングとテストのための新しい進化的計算フレームワークです。その設計は、アルゴリズムを明示し、データ構造を透明にしようとするという点で、他のほとんどの既存のフレームワークとは異なります。これは、より一般的なブラックボックスフレームワークとは対照的です。http://deap.gel.ulaval.caで広範なドキュメントとともに無料で利用できるDEAPは、LGPLライセンスに基づくオープンソースプロジェクトです。

On the Necessity of Irrelevant Variables
無関係な変数の必要性について

This work explores the effects of relevant and irrelevant boolean variables on the accuracy of classifiers. The analysis uses the assumption that the variables are conditionally independent given the class, and focuses on a natural family of learning algorithms for such sources when the relevant variables have a small advantage over random guessing. The main result is that algorithms relying predominately on irrelevant variables have error probabilities that quickly go to 0 in situations where algorithms that limit the use of irrelevant variables have errors bounded below by a positive constant. We also show that accurate learning is possible even when there are so few examples that one cannot determine with high confidence whether or not any individual variable is relevant.

この研究では、関連性のあるブール変数と無関係なブール変数が分類器の精度に及ぼす影響を調査します。この分析では、変数がクラスを与えられたときに条件付きで独立しているという仮定を使用し、関連する変数がランダム推測よりもわずかに優れている場合、そのようなソースの学習アルゴリズムの自然なファミリーに焦点を当てます。主な結果は、主に無関係な変数に依存するアルゴリズムのエラー確率は、無関係な変数の使用を制限するアルゴリズムのエラーが正の定数によって制限されている状況では、すぐに0になるということです。また、個々の変数が関連しているかどうかを高い信頼性で判断できないほど例が少ない場合でも、正確な学習が可能であることを示します。

A Comparison of the Lasso and Marginal Regression
LassoとMarginal回帰の比較

The lasso is an important method for sparse, high-dimensional regression problems, with efficient algorithms available, a long history of practical success, and a large body of theoretical results supporting and explaining its performance. But even with the best available algorithms, finding the lasso solutions remains a computationally challenging task in cases where the number of covariates vastly exceeds the number of data points. Marginal regression, where each dependent variable is regressed separately on each covariate, offers a promising alternative in this case because the estimates can be computed roughly two orders faster than the lasso solutions. The question that remains is how the statistical performance of the method compares to that of the lasso in these cases. In this paper, we study the relative statistical performance of the lasso and marginal regression for sparse, high-dimensional regression problems. We consider the problem of learning which coefficients are non-zero. Our main results are as follows: (i) we compare the conditions under which the lasso and marginal regression guarantee exact recovery in the fixed design, noise free case; (ii) we establish conditions under which marginal regression provides exact recovery with high probability in the fixed design, noise free, random coefficients case; and (iii) we derive rates of convergence for both procedures, where performance is measured by the number of coefficients with incorrect sign, and characterize the regions in the parameter space recovery is and is not possible under this metric. In light of the computational advantages of marginal regression in very high dimensional problems, our theoretical and simulations results suggest that the procedure merits further study.

Lassoは、効率的なアルゴリズムが利用可能で、実践的な成功の長い歴史があり、そのパフォーマンスをサポートし説明する膨大な理論的結果を備えた、スパースで高次元の回帰問題に対する重要な手法です。しかし、利用可能な最良のアルゴリズムであっても、共変量の数がデータポイントの数を大幅に上回る場合には、Lassoソリューションを見つけることは計算上困難な作業のままです。各従属変数が各共変量で個別に回帰される限界回帰は、推定値をLassoソリューションよりも約2桁速く計算できるため、この場合に有望な代替手段となります。残る疑問は、これらの場合にこの手法の統計的パフォーマンスがLassoのそれと比べてどうなのかということです。この論文では、スパースで高次元の回帰問題に対するLassoと限界回帰の相対的な統計的パフォーマンスを検討します。どの係数がゼロでないかを学習する問題について検討します。主な結果は次のとおりです。(i)固定設計、ノイズなしの場合にLassoと限界回帰が正確な回復を保証する条件を比較します。(ii)固定設計、ノイズなし、ランダム係数の場合に限界回帰が高確率で正確な回復を提供する条件を確立します。(iii)両方の手順の収束率を導出します。パフォーマンスは、誤った符号を持つ係数の数で測定され、この測定基準で回復が可能なパラメーター空間領域と不可能なパラメーター空間領域が特徴付けられます。非常に高次元の問題における限界回帰の計算上の利点を考慮すると、理論とシミュレーションの結果は、この手順がさらに研究される価値があることを示唆しています。

Optimistic Bayesian Sampling in Contextual-Bandit Problems
文脈バンディット問題における楽観的ベイジアンサンプリング

In sequential decision problems in an unknown environment, the decision maker often faces a dilemma over whether to explore to discover more about the environment, or to exploit current knowledge. We address the exploration-exploitation dilemma in a general setting encompassing both standard and contextualised bandit problems. The contextual bandit problem has recently resurfaced in attempts to maximise click-through rates in web based applications, a task with significant commercial interest. In this article we consider an approach of Thompson (1933) which makes use of samples from the posterior distributions for the instantaneous value of each action. We extend the approach by introducing a new algorithm, Optimistic Bayesian Sampling (OBS), in which the probability of playing an action increases with the uncertainty in the estimate of the action value. This results in better directed exploratory behaviour. We prove that, under unrestrictive assumptions, both approaches result in optimal behaviour with respect to the average reward criterion of Yang and Zhu (2002). We implement OBS and measure its performance in simulated Bernoulli bandit and linear regression domains, and also when tested with the task of personalised news article recommendation on a Yahoo! Front Page Today Module data set. We find that OBS performs competitively when compared to recently proposed benchmark algorithms and outperforms Thompson’s method throughout.

未知の環境における逐次的な意思決定問題では、意思決定者は環境についてさらに詳しく知るために探索するか、現在の知識を活用するかというジレンマに直面することがよくあります。私たちは、標準的なバンディット問題とコンテキスト化されたバンディット問題の両方を含む一般的な設定で、探索と活用のジレンマを取り上げます。コンテキスト化されたバンディット問題は、Webベースのアプリケーションでクリックスルー率を最大化する試みで最近再浮上しており、これは商業的に大きな関心を集めているタスクです。この記事では、各アクションの瞬間的な値について事後分布からのサンプルを使用するThompson (1933)のアプローチを検討します。私たちは、アクションを実行する確率がアクション値の推定値の不確実性とともに増加する、新しいアルゴリズムであるOptimistic Bayesian Sampling (OBS)を導入することで、このアプローチを拡張します。これにより、より指向性の高い探索行動が実現します。私たちは、制限のない仮定の下で、両方のアプローチがYangとZhu (2002)の平均報酬基準に関して最適な行動をもたらすことを証明します。OBSを実装し、シミュレートされたベルヌーイバンディットおよび線形回帰ドメインでのパフォーマンスを測定し、また、Yahoo! Front Page Today Moduleデータセットでのパーソナライズされたニュース記事の推奨タスクでテストした場合のパフォーマンスも測定しました。最近提案されたベンチマークアルゴリズムと比較して、OBSのパフォーマンスは競争力があり、全体的にThompsonの方法よりも優れていることがわかりました。

Pattern for Python
Python のパターン

Pattern is a package for Python 2.4+ with functionality for web mining (Google + Twitter + Wikipedia, web spider, HTML DOM parser), natural language processing (tagger/chunker, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, k-means clustering, Naive Bayes + k-NN + SVM classifiers) and network analysis (graph centrality and visualization). It is well documented and bundled with 30+ examples and 350+ unit tests. The source code is licensed under BSD and available from http://www.clips.ua.ac.be/pages/pattern.

PatternはPython 2.4+用のパッケージで、Webマイニング(Google + Twitter + Wikipedia、Webスパイダー、HTML DOMパーサー)、自然言語処理(タガー/チャンカー、n-gram検索、感情分析、WordNet)、機械学習(ベクトル空間モデル、k-meansクラスタリング、単純ベイズ+ k-NN + SVM分類器)、ネットワーク分析(グラフ中心性と視覚化)の機能があります。これは十分に文書化されており、30 +の例と350 +ユニットテストがバンドルされています。ソースコードはBSDの下でライセンスされており、http://www.clips.ua.ac.be/pages/patternから入手できます。

EP-GIG Priors and Applications in Bayesian Sparse Learning
ベイズスパース学習におけるEP-GIGの事前確率と応用

In this paper we propose a novel framework for the construction of sparsity-inducing priors. In particular, we define such priors as a mixture of exponential power distributions with a generalized inverse Gaussian density (EP-GIG). EP-GIG is a variant of generalized hyperbolic distributions, and the special cases include Gaussian scale mixtures and Laplace scale mixtures. Furthermore, Laplace scale mixtures can subserve a Bayesian framework for sparse learning with nonconvex penalization. The densities of EP-GIG can be explicitly expressed. Moreover, the corresponding posterior distribution also follows a generalized inverse Gaussian distribution. We exploit these properties to develop EM algorithms for sparse empirical Bayesian learning. We also show that these algorithms bear an interesting resemblance to iteratively reweighted l2 or l1 methods. Finally, we present two extensions for grouped variable selection and logistic regression.

この論文では、スパース性を誘導する事前確率を構築するための新しいフレームワークを提案します。特に、このような事前確率は、指数関数的電力分布と一般化逆ガウス密度(EP-GIG)の混合物として定義します。EP-GIGは一般化双曲線分布の変形であり、特殊なケースにはガウススケールの混合物とラプラススケールの混合物が含まれます。さらに、ラプラススケールの混合物は、非凸ペナルティを伴うスパース学習のためのベイズフレームワークを補助できます。EP-GIGの密度は明示的に表現できます。さらに、対応する事後分布も一般化された逆ガウス分布に従います。これらの特性を利用して、スパース経験的ベイズ学習のためのEMアルゴリズムを開発します。また、これらのアルゴリズムが、反復的に再重み付けされたl2またはl1メソッドと興味深い類似性を持っていることも示します。最後に、グループ化された変数選択とロジスティック回帰の2つの拡張を示します。

An Improved GLMNET for L1-regularized Logistic Regression
L1正則化ロジスティック回帰のための改良型GLMNET

Recently, Yuan et al. (2010) conducted a comprehensive comparison on software for L1-regularized classification. They concluded that a carefully designed coordinate descent implementation CDN is the fastest among state-of-the-art solvers. In this paper, we point out that CDN is less competitive on loss functions that are expensive to compute. In particular, CDN for logistic regression is much slower than CDN for SVM because the logistic loss involves expensive exp/log operations. In optimization, Newton methods are known to have fewer iterations although each iteration costs more. Because solving the Newton sub-problem is independent of the loss calculation, this type of methods may surpass CDN under some circumstances. In L1-regularized classification, GLMNET by Friedman et al. is already a Newton-type method, but experiments in Yuan et al. (2010) indicated that the existing GLMNET implementation may face difficulties for some large-scale problems. In this paper, we propose an improved GLMNET to address some theoretical and implementation issues. In particular, as a Newton-type method, GLMNET achieves fast local convergence, but may fail to quickly obtain a useful solution. By a careful design to adjust the effort for each iteration, our method is efficient for both loosely or strictly solving the optimization problem. Experiments demonstrate that our improved GLMNET is more efficient than CDN for L1-regularized logistic regression.

最近、Yuanら(2010)は、L1正則化分類のソフトウェアに関する包括的な比較を実施しました。彼らは、注意深く設計された座標降下実装CDNが最先端のソルバーの中で最も高速であると結論付けました。この論文では、計算コストが高い損失関数ではCDNの競争力が低いことを指摘しています。特に、ロジスティック回帰のCDNは、ロジスティック損失にコストの高いexp/log演算が含まれるため、SVMのCDNよりもはるかに低速です。最適化では、ニュートン法は反復回数が少ないことが知られていますが、各反復のコストは高くなります。ニュートン部分問題の解決は損失計算とは無関係であるため、このタイプの方法は状況によってはCDNを上回る可能性があります。L1正則化分類では、FriedmanらによるGLMNETはすでにニュートン型法ですが、Yuanら(2010)の実験では、既存のGLMNET実装では一部の大規模問題で困難に直面する可能性があることが示されました。この論文では、いくつかの理論的および実装上の問題に対処するために、改良されたGLMNETを提案します。特に、ニュートン型法として、GLMNETは高速な局所収束を実現しますが、有用なソリューションをすぐに取得できない場合があります。各反復の労力を調整するように慎重に設計することにより、私たちの方法は、最適化問題を緩く解決する場合も厳密に解決する場合も効率的です。実験により、改良されたGLMNETは、L1正規化ロジスティック回帰に対してCDNよりも効率的であることが実証されています。

Variable Selection in High-dimensional Varying-coefficient Models with Global Optimality
大域最適性を持つ高次元変動係数モデルにおける変数選択

The varying-coefficient model is flexible and powerful for modeling the dynamic changes of regression coefficients. It is important to identify significant covariates associated with response variables, especially for high-dimensional settings where the number of covariates can be larger than the sample size. We consider model selection in the high-dimensional setting and adopt difference convex programming to approximate the L0 penalty, and we investigate the global optimality properties of the varying-coefficient estimator. The challenge of the variable selection problem here is that the dimension of the nonparametric form for the varying-coefficient modeling could be infinite, in addition to dealing with the high-dimensional linear covariates. We show that the proposed varying-coefficient estimator is consistent, enjoys the oracle property and achieves an optimal convergence rate for the non-zero nonparametric components for high-dimensional data. Our simulations and numerical examples indicate that the difference convex algorithm is efficient using the coordinate decent algorithm, and is able to select the true model at a higher frequency than the least absolute shrinkage and selection operator (LASSO), the adaptive LASSO and the smoothly clipped absolute deviation (SCAD) approaches.

変動係数モデルは、回帰係数の動的な変化をモデル化するのに柔軟かつ強力です。応答変数に関連する重要な共変量を識別することは、共変量の数がサンプルサイズよりも大きくなる可能性がある高次元設定では特に重要です。高次元設定でのモデル選択を考慮し、差分凸計画法を採用してL0ペナルティを近似し、変動係数推定量のグローバル最適性特性を調査します。ここでの変数選択問題の課題は、高次元の線形共変量を処理することに加えて、変動係数モデリングのノンパラメトリック形式の次元が無限になる可能性があることです。提案された変動係数推定量は一貫性があり、オラクル特性を備え、高次元データの非ゼロノンパラメトリックコンポーネントの最適な収束率を実現することを示します。私たちのシミュレーションと数値例は、差分凸アルゴリズムが座標降下アルゴリズムを使用することで効率的であり、最小絶対収縮および選択演算子(LASSO)、適応型LASSO、および滑らかにクリップされた絶対偏差(SCAD)アプローチよりも高い頻度で真のモデルを選択できることを示しています。

Jstacs: A Java Framework for Statistical Analysis and Classification of Biological Sequences
Jstacs: 生物学的配列の統計分析と分類のためのJavaフレームワーク

Jstacs is an object-oriented Java library for analysing and classifying sequence data, which emerged from the need for a standardized implementation of statistical models, learning principles, classifiers, and performance measures. In Jstacs, these components can be used, combined, and extended easily, which allows for a direct comparison of different approaches and fosters the development of new components. Jstacs is especially tailored to biological sequence data, but is also applicable to general discrete and continuous data. Jstacs is freely available at http://www.jstacs.de under the GNU GPL license including an API documentation, a cookbook, and code examples.

Jstacsは、統計モデル、学習原理、分類器、およびパフォーマンス測定の標準化された実装の必要性から生まれた、シーケンスデータを分析および分類するためのオブジェクト指向Javaライブラリです。Jstacsでは、これらのコンポーネントを簡単に使用、組み合わせ、拡張できるため、さまざまなアプローチを直接比較でき、新しいコンポーネントの開発が促進されます。Jstacsは、特に生物学的配列データに特化して対応していますが、一般的な離散データや連続データにも適用可能です。Jstacsは、GNU GPLライセンスの下で、APIドキュメント、クックブック、コード例など、http://www.jstacs.deで無料で入手できます。

Integrating a Partial Model into Model Free Reinforcement Learning
偏モデルからモデル自由強化学習への統合

In reinforcement learning an agent uses online feedback from the environment in order to adaptively select an effective policy. Model free approaches address this task by directly mapping environmental states to actions, while model based methods attempt to construct a model of the environment, followed by a selection of optimal actions based on that model. Given the complementary advantages of both approaches, we suggest a novel procedure which augments a model free algorithm with a partial model. The resulting hybrid algorithm switches between a model based and a model free mode, depending on the current state and the agent’s knowledge. Our method relies on a novel definition for a partially known model, and an estimator that incorporates such knowledge in order to reduce uncertainty in stochastic approximation iterations. We prove that such an approach leads to improved policy evaluation whenever environmental knowledge is available, without compromising performance when such knowledge is absent. Numerical simulations demonstrate the effectiveness of the approach on policy gradient and Q-learning algorithms, and its usefulness in solving a call admission control problem.

強化学習では、エージェントは環境からのオンラインフィードバックを使用して、効果的なポリシーを適応的に選択します。モデルフリーアプローチでは、環境状態をアクションに直接マッピングすることでこのタスクに対処します。一方、モデルベースメソッドでは、環境のモデルを構築し、そのモデルに基づいて最適なアクションを選択します。両方のアプローチの相補的な利点を考慮して、モデルフリーアルゴリズムを部分モデルで拡張する新しい手順を提案します。結果として得られるハイブリッドアルゴリズムは、現在の状態とエージェントの知識に応じて、モデルベースモードとモデルフリーモードを切り替えます。この方法は、部分的に既知のモデルの新しい定義と、そのような知識を組み込んで確率的近似反復の不確実性を減らす推定量に依存しています。このようなアプローチにより、環境知識が利用できる場合は常にポリシー評価が向上し、そのような知識がない場合でもパフォーマンスが損なわれることがないことを証明します。数値シミュレーションにより、このアプローチがポリシー勾配およびQ学習アルゴリズムに有効であること、およびコールアドミッション制御問題を解決する上での有用性が実証されています。

Confidence-Weighted Linear Classification for Text Categorization
テキスト分類のための信頼度加重線形分類

Confidence-weighted online learning is a generalization of margin-based learning of linear classifiers in which the margin constraint is replaced by a probabilistic constraint based on a distribution over classifier weights that is updated online as examples are observed. The distribution captures a notion of confidence on classifier weights, and in some cases it can also be interpreted as replacing a single learning rate by adaptive per-weight rates. Confidence-weighted learning was motivated by the statistical properties of natural-language classification tasks, where most of the informative features are relatively rare. We investigate several versions of confidence-weighted learning that use a Gaussian distribution over weight vectors, updated at each observed example to achieve high probability of correct classification for the example. Empirical evaluation on a range of text-categorization tasks show that our algorithms improve over other state-of-the-art online and batch methods, learn faster in the online setting, and lead to better classifier combination for a type of distributed training commonly used in cloud computing.

信頼度加重オンライン学習は、マージン制約が、例が観察されるたびにオンラインで更新される分類器の重みの分布に基づく確率制約に置き換えられた、線形分類器のマージンベース学習の一般化です。分布は分類器の重みに対する信頼の概念を捉えており、場合によっては、単一の学習率を重みごとの適応率に置き換えると解釈することもできます。信頼度加重学習は、有益な特徴のほとんどが比較的まれである自然言語分類タスクの統計的特性に触発されました。私たちは、例の正しい分類の確率を高めるために、観察される例ごとに更新される重みベクトルのガウス分布を使用する信頼度加重学習のいくつかのバージョンを調査します。さまざまなテキスト分類タスクの実証的評価により、私たちのアルゴリズムは他の最先端のオンラインおよびバッチ方式よりも優れており、オンライン設定でより速く学習し、クラウドコンピューティングで一般的に使用されるタイプの分散トレーニングに適した分類器の組み合わせにつながることが示されています。

Regularization Techniques for Learning with Matrices
行列を用いた学習のための正則化手法

There is growing body of learning problems for which it is natural to organize the parameters into a matrix. As a result, it becomes easy to impose sophisticated prior knowledge by appropriately regularizing the parameters under some matrix norm. This work describes and analyzes a systematic method for constructing such matrix-based regularization techniques. In particular, we focus on how the underlying statistical properties of a given problem can help us decide which regularization function is appropriate. Our methodology is based on a known duality phenomenon: a function is strongly convex with respect to some norm if and only if its conjugate function is strongly smooth with respect to the dual norm. This result has already been found to be a key component in deriving and analyzing several learning algorithms. We demonstrate the potential of this framework by deriving novel generalization and regret bounds for multi-task learning, multi-class learning, and multiple kernel learning.

パラメータを行列に整理することが自然な学習問題が増えています。その結果、何らかの行列ノルムの下でパラメータを適切に正規化することで、高度な事前知識を課すことが容易になります。この研究では、そのような行列ベースの正規化手法を構築するための体系的な方法について説明し、分析します。特に、特定の問題の根底にある統計的特性が、どの正規化関数が適切かを判断するのにどのように役立つかに焦点を当てます。私たちの方法論は、既知の双対性現象に基づいています。つまり、関数があるノルムに関して強く凸であるためには、その共役関数が双対ノルムに関して強く滑らかである必要があります。この結果は、いくつかの学習アルゴリズムを導出および分析するための重要な要素であることがすでにわかっています。私たちは、マルチタスク学習、マルチクラス学習、およびマルチカーネル学習の新しい一般化境界と後悔境界を導出することで、このフレームワークの可能性を実証します。

Estimation and Selection via Absolute Penalized Convex Minimization And Its Multistage Adaptive Applications
絶対ペナルティ凸最小化による推定と選択とその多段階適応応用

The l1-penalized method, or the Lasso, has emerged as an important tool for the analysis of large data sets. Many important results have been obtained for the Lasso in linear regression which have led to a deeper understanding of high-dimensional statistical problems. In this article, we consider a class of weighted l1-penalized estimators for convex loss functions of a general form, including the generalized linear models. We study the estimation, prediction, selection and sparsity properties of the weighted l1-penalized estimator in sparse, high-dimensional settings where the number of predictors p can be much larger than the sample size n. Adaptive Lasso is considered as a special case. A multistage method is developed to approximate concave regularized estimation by applying an adaptive Lasso recursively. We provide prediction and estimation oracle inequalities for single- and multi-stage estimators, a general selection consistency theorem, and an upper bound for the dimension of the Lasso estimator. Important models including the linear regression, logistic regression and log-linear models are used throughout to illustrate the applications of the general results.

l1ペナルティ法、またはLassoは、大規模なデータセットの分析のための重要なツールとして登場しました。線形回帰におけるLassoに関して多くの重要な結果が得られ、高次元の統計的問題に対する理解が深まりました。この記事では、一般化線形モデルを含む一般的な形式の凸損失関数に対する重み付きl1ペナルティ推定量のクラスを検討します。予測変数の数pがサンプルサイズnよりはるかに大きくなる可能性があるスパースな高次元設定での重み付きl1ペナルティ推定量の推定、予測、選択、およびスパース特性を調べます。適応型Lassoは特殊なケースと見なされます。適応型Lassoを再帰的に適用することで、凹型正規化推定を近似する多段階法が開発されました。単一段階および多段階推定量の予測および推定オラクル不等式、一般的な選択一貫性定理、およびLasso推定量の次元の上限を示します。一般的な結果の応用を説明するために、線形回帰、ロジスティック回帰、対数線形モデルなどの重要なモデルが全体を通して使用されています。

Entropy Search for Information-Efficient Global Optimization
情報効率の高い大域最適化のためのエントロピー探索

Contemporary global optimization algorithms are based on local measures of utility, rather than a probability measure over location and value of the optimum. They thus attempt to collect low function values, not to learn about the optimum. The reason for the absence of probabilistic global optimizers is that the corresponding inference problem is intractable in several ways. This paper develops desiderata for probabilistic optimization algorithms, then presents a concrete algorithm which addresses each of the computational intractabilities with a sequence of approximations and explicitly addresses the decision problem of maximizing information gain from each evaluation.

現代のグローバル最適化アルゴリズムは、最適の位置と値に対する確率尺度ではなく、局所的な効用尺度に基づいています。したがって、彼らは最適について学習するのではなく、低い関数値を収集しようとします。確率的グローバルオプティマイザーがない理由は、対応する推論問題がいくつかの点で扱いにくいためです。この論文では、確率的最適化アルゴリズムのdesiderataを開発し、次に、一連の近似で各計算難解性に対処し、各評価からの情報利得を最大化する決定問題に明示的に対処する具体的なアルゴリズムを提示します。

Variational Multinomial Logit Gaussian Process
変分多項ロジットガウス過程

Gaussian process prior with an appropriate likelihood function is a flexible non-parametric model for a variety of learning tasks. One important and standard task is multi-class classification, which is the categorization of an item into one of several fixed classes. A usual likelihood function for this is the multinomial logistic likelihood function. However, exact inference with this model has proved to be difficult because high-dimensional integrations are required. In this paper, we propose a variational approximation to this model, and we describe the optimization of the variational parameters. Experiments have shown our approximation to be tight. In addition, we provide data-independent bounds on the marginal likelihood of the model, one of which is shown to be much tighter than the existing variational mean-field bound in the experiments. We also derive a proper lower bound on the predictive likelihood that involves the Kullback-Leibler divergence between the approximating and the true posterior. We combine our approach with a recently proposed sparse approximation to give a variational sparse approximation to the Gaussian process multi-class model. We also derive criteria which can be used to select the inducing set, and we show the effectiveness of these criteria over random selection in an experiment.

適切な尤度関数を持つガウス過程事前分布は、さまざまな学習タスクのための柔軟なノンパラメトリックモデルです。重要かつ標準的なタスクの1つは、アイテムを複数の固定クラスのいずれかに分類する多クラス分類です。この尤度関数の通常の尤度関数は、多項ロジスティック尤度関数です。ただし、高次元の積分が必要なため、このモデルによる正確な推論は困難であることが判明しています。この論文では、このモデルの変分近似を提案し、変分パラメータの最適化について説明します。実験により、この近似が厳密であることが示されています。さらに、モデルの周辺尤度に関するデータに依存しない境界を提供します。そのうちの1つは、実験で既存の変分平均場境界よりもはるかに厳密であることが示されています。また、近似事後分布と真の事後分布の間のKullback-Leiblerダイバージェンスを含む予測尤度の適切な下限を導出します。私たちは、最近提案されたスパース近似と我々のアプローチを組み合わせて、ガウス過程マルチクラスモデルに変分スパース近似を与えます。また、誘導セットを選択するために使用できる基準を導出し、実験でこれらの基準がランダム選択よりも有効であることを示します。

Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning
正則化確率的オンライン学習のための双対平均化における多様体同定

Iterative methods that calculate their steps from approximate subgradient directions have proved to be useful for stochastic learning problems over large and streaming data sets. When the objective consists of a loss function plus a nonsmooth regularization term, the solution often lies on a low-dimensional manifold of parameter space along which the regularizer is smooth. (When an l1 regularizer is used to induce sparsity in the solution, for example, this manifold is defined by the set of nonzero components of the parameter vector.) This paper shows that a regularized dual averaging algorithm can identify this manifold, with high probability, before reaching the solution. This observation motivates an algorithmic strategy in which, once an iterate is suspected of lying on an optimal or near-optimal manifold, we switch to a “local phase” that searches in this manifold, thus converging rapidly to a near-optimal point. Computational results are presented to verify the identification property and to illustrate the effectiveness of this approach.

近似的なサブグラディエント方向からステップを計算する反復法は、大規模でストリーミングなデータセットに対する確率的学習問題に有効であることが証明されています。目的関数が損失関数と滑らかでない正則化項で構成されている場合、解は多くの場合、正則化が滑らかであるパラメータ空間の低次元多様体上にあります。(たとえば、l1正則化を使用して解にスパース性を誘発する場合、この多様体はパラメータベクトルの非ゼロ成分のセットによって定義されます。)この論文では、正則化されたデュアル平均化アルゴリズムが、解に到達する前にこの多様体を高い確率で識別できることを示しています。この観察から、反復が最適または最適に近い多様体上にあると疑われると、この多様体を検索する「ローカルフェーズ」に切り替えて、最適に近いポイントに急速に収束するというアルゴリズム戦略が生まれます。識別特性を検証し、このアプローチの有効性を示すために、計算結果を示します。

glm-ie: Generalised Linear Models Inference & Estimation Toolbox
glm-ie: 一般化線形モデル推論 & 推定ツールボックス

The glm-ie toolbox contains functionality for estimation and inference in generalised linear models over continuous-valued variables. Besides a variety of penalised least squares solvers for estimation, it offers inference based on (convex) variational bounds, on expectation propagation and on factorial mean field. Scalable and efficient inference in fully-connected undirected graphical models or Markov random fields with Gaussian and non-Gaussian potentials is achieved by casting all the computations as matrix vector multiplications. We provide a wide choice of penalty functions for estimation, potential functions for inference and matrix classes with lazy evaluation for convenient modelling. We designed the glm-ie package to be simple, generic and easily expansible. Most of the code is written in Matlab including some MEX files to be fully compatible to both Matlab 7.x and GNU Octave 3.3.x. Large scale probabilistic classification as well as sparse linear modelling can be performed in a common algorithmical framework by the glm-ie toolkit.

glm-ieツールボックスには、連続値変数の一般化線形モデルにおける推定と推論の機能が含まれています。推定のためのさまざまなペナルティ付き最小二乗ソルバーに加えて、(凸)変分境界、期待値伝播、および階乗平均場に基づく推論を提供します。ガウスおよび非ガウスポテンシャルを持つ完全結合無向グラフィカルモデルまたはマルコフランダムフィールドにおけるスケーラブルで効率的な推論は、すべての計算を行列ベクトル乗算としてキャストすることで実現されます。推定用のペナルティ関数、推論用のポテンシャル関数、便利なモデリングのための遅延評価を備えた行列クラスなど、幅広い選択肢を提供します。glm-ieパッケージは、シンプルで汎用的、かつ簡単に拡張できるように設計されました。コードのほとんどはMatlabで記述されており、一部のMEXファイルはMatlab 7.xとGNU Octave 3.3.xの両方に完全に互換性があります。glm-ieツールキットを使用すると、共通のアルゴリズムフレームワークで大規模な確率分類とスパース線形モデリングを実行できます。

Restricted Strong Convexity and Weighted Matrix Completion: Optimal Bounds with Noise
制限付き強凸と重み付け行列の完了: ノイズによる最適範囲

We consider the matrix completion problem under a form of row/column weighted entrywise sampling, including the case of uniform entrywise sampling as a special case. We analyze the associated random observation operator, and prove that with high probability, it satisfies a form of restricted strong convexity with respect to weighted Frobenius norm. Using this property, we obtain as corollaries a number of error bounds on matrix completion in the weighted Frobenius norm under noisy sampling and for both exact and near low-rank matrices. Our results are based on measures of the “spikiness” and “low-rankness” of matrices that are less restrictive than the incoherence conditions imposed in previous work. Our technique involves an M-estimator that includes controls on both the rank and spikiness of the solution, and we establish non-asymptotic error bounds in weighted Frobenius norm for recovering matrices lying with lq-“balls” of bounded spikiness. Using information-theoretic methods, we show that no algorithm can achieve better estimates (up to a logarithmic factor) over these same sets, showing that our conditions on matrices and associated rates are essentially optimal.

私たちは、行/列の重み付きエントリワイズサンプリングの形式の下での行列完成問題を検討します。これには、特別なケースとして均一なエントリワイズサンプリングのケースも含まれます。関連するランダム観測演算子を分析し、高い確率で、重み付きフロベニウスノルムに関して制限された強い凸性の形式を満たすことを証明します。この特性を使用して、ノイズのあるサンプリングの下で、正確な低ランク行列と低ランクに近い行列の両方について、重み付きフロベニウスノルムでの行列完成に関するいくつかの誤差境界を結果として得ます。我々の結果は、以前の研究で課された非一貫性条件よりも制限の少ない、行列の「スパイク性」と「低ランク性」の尺度に基づいています。我々の手法には、解のランクとスパイク性の両方に対する制御を含むM推定量が含まれており、境界付きスパイク性のlq「ボール」にある行列を回復するための重み付きフロベニウスノルムの非漸近的誤差境界を確立します。情報理論的手法を使用して、これらの同じセットに対して、どのアルゴリズムも（対数係数まで）より良い推定値を達成できないことを示します。これは、行列と関連するレートに関する条件が本質的に最適であることを示しています。

Mixability is Bayes Risk Curvature Relative to Log Loss
混合性は、対数損失に対するベイズリスク曲率です

Mixability of a loss characterizes fast rates in the online learning setting of prediction with expert advice. The determination of the mixability constant for binary losses is straightforward but opaque. In the binary case we make this transparent and simpler by characterising mixability in terms of the second derivative of the Bayes risk of proper losses. We then extend this result to multiclass proper losses where there are few existing results. We show that mixability is governed by the maximum eigenvalue of the Hessian of the Bayes risk, relative to the Hessian of the Bayes risk for log loss. We conclude by comparing our result to other work that bounds prediction performance in terms of the geometry of the Bayes risk. Although all calculations are for proper losses, we also show how to carry the results across to improper losses.

損失の混合性は、専門家のアドバイスによる予測のオンライン学習環境での高速性を特徴付けます。バイナリ損失の混合性定数の決定は簡単ですが、不透明です。バイナリの場合、適切な損失のベイズリスクの2次導関数の観点から混合性を特徴付けることにより、これを透明でシンプルにします。次に、この結果を、既存の結果がほとんどない多クラス適正損失に拡張します。混合性は、対数損失のベイズのヘッセリスクに対するベイズリスクのヘッセの最大固有値によって支配されることを示します。最後に、ベイズリスクの形状に関して予測パフォーマンスを制限する他の作業と結果を比較します。すべての計算は適切な損失のためのものですが、その結果を不適切な損失に引き継ぐ方法も示しています。

A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models
スペクトル次元削減のための統一確率的視点:洞察と新しいモデル

We introduce a new perspective on spectral dimensionality reduction which views these methods as Gaussian Markov random fields (GRFs). Our unifying perspective is based on the maximum entropy principle which is in turn inspired by maximum variance unfolding. The resulting model, which we call maximum entropy unfolding (MEU) is a nonlinear generalization of principal component analysis. We relate the model to Laplacian eigenmaps and isomap. We show that parameter fitting in the locally linear embedding (LLE) is approximate maximum likelihood MEU. We introduce a variant of LLE that performs maximum likelihood exactly: Acyclic LLE (ALLE). We show that MEU and ALLE are competitive with the leading spectral approaches on a robot navigation visualization and a human motion capture data set. Finally the maximum likelihood perspective allows us to introduce a new approach to dimensionality reduction based on L1 regularization of the Gaussian random field via the graphical lasso.

私たちは、スペクトル次元削減に関する新しい視点を導入し、これらの手法をガウスマルコフ確率場(GRF)と見なします。私たちの統一的な視点は、最大エントロピー原理に基づいており、最大エントロピーの原理は、最大分散の展開に触発されています。結果として得られるモデルは、最大エントロピーアンフォールディング(MEU)と呼ばれ、主成分分析の非線形一般化です。このモデルをラプラシアン固有マップとアイソマップに関連付けます。局所線形埋め込み(LLE)のパラメーターフィッティングが近似最尤MEUであることを示します。最尤を正確に実行するLLEのバリアントである非巡回LLE(ALLE)を紹介します。MEUとALLEは、ロボットナビゲーションの視覚化と人間のモーションキャプチャデータセットに関する主要なスペクトルアプローチと競合していることを示しています。最後に、最尤法の視点により、グラフィカルなげなわを介したガウス確率場のL1正則化に基づく次元削減への新しいアプローチを導入できます。

A Model of the Perception of Facial Expressions of Emotion by Humans: Research Overview and Perspectives
人間による感情表現の知覚モデル:研究概要と展望

In cognitive science and neuroscience, there have been two leading models describing how humans perceive and classify facial expressions of emotion—the continuous and the categorical model. The continuous model defines each facial expression of emotion as a feature vector in a face space. This model explains, for example, how expressions of emotion can be seen at different intensities. In contrast, the categorical model consists of C classifiers, each tuned to a specific emotion category. This model explains, among other findings, why the images in a morphing sequence between a happy and a surprise face are perceived as either happy or surprise but not something in between. While the continuous model has a more difficult time justifying this latter finding, the categorical model is not as good when it comes to explaining how expressions are recognized at different intensities or modes. Most importantly, both models have problems explaining how one can recognize combinations of emotion categories such as happily surprised versus angrily surprised versus surprise. To resolve these issues, in the past several years, we have worked on a revised model that justifies the results reported in the cognitive science and neuroscience literature. This model consists of C distinct continuous spaces. Multiple (compound) emotion categories can be recognized by linearly combining these C face spaces. The dimensions of these spaces are shown to be mostly configural. According to this model, the major task for the classification of facial expressions of emotion is precise, detailed detection of facial landmarks rather than recognition. We provide an overview of the literature justifying the model, show how the resulting model can be employed to build algorithms for the recognition of facial expression of emotion, and propose research directions in machine learning and computer vision researchers to keep pushing the state of the art in these areas. We also discuss how the model can aid in studies of human perception, social interactions and disorders.

認知科学と神経科学では、人間が感情の表情をどのように認識し分類するかを説明する主要なモデルが2つあります。連続モデルとカテゴリモデルです。連続モデルは、感情の表情を顔空間内の特徴ベクトルとして定義します。このモデルは、たとえば、感情の表情がさまざまな強度でどのように見えるかを説明します。対照的に、カテゴリモデルはC個の分類器で構成され、それぞれが特定の感情カテゴリに調整されています。このモデルは、他の発見の中でも、幸せな顔と驚いた顔の間のモーフィングシーケンスの画像が、幸せな顔または驚きとして認識され、その中間として認識されない理由を説明します。連続モデルでは、この後者の発見を正当化するのがより困難ですが、カテゴリモデルは、表情が異なる強度またはモードでどのように認識されるかを説明することに関してはそれほど優れていません。最も重要なことは、両方のモデルが、幸せな驚きと怒った驚きと驚きなどの感情カテゴリの組み合わせを認識する方法を説明するのに問題があることです。これらの問題を解決するために、私たちは過去数年間、認知科学と神経科学の文献で報告された結果を正当化する改訂モデルに取り組んできました。このモデルはC個の異なる連続空間で構成されています。これらのC個の顔空間を線形に組み合わせることで、複数の(複合)感情カテゴリを認識できます。これらの空間の次元は、主に構成的であることが示されています。このモデルによると、感情の顔表情の分類の主なタスクは、認識ではなく、顔のランドマークの正確で詳細な検出です。私たちは、このモデルを正当化する文献の概要を示し、結果として得られたモデルを使用して感情の顔表情を認識するアルゴリズムを構築する方法を示し、機械学習とコンピュータービジョンの研究者がこれらの分野で最先端の研究を推進し続けるための研究の方向性を提案します。また、このモデルが人間の知覚、社会的相互作用、障害の研究にどのように役立つかについても説明します。

Activized Learning: Transforming Passive to Active with Improved Label Complexity
アクティファイド・ラーニング:ラベルの複雑さを改善したパッシブからアクティブへの変換

We study the theoretical advantages of active learning over passive learning. Specifically, we prove that, in noise-free classifier learning for VC classes, any passive learning algorithm can be transformed into an active learning algorithm with asymptotically strictly superior label complexity for all nontrivial target functions and distributions. We further provide a general characterization of the magnitudes of these improvements in terms of a novel generalization of the disagreement coefficient. We also extend these results to active learning in the presence of label noise, and find that even under broad classes of noise distributions, we can typically guarantee strict improvements over the known results for passive learning.

私たちは、パッシブラーニングに対するアクティブラーニングの理論的利点を研究します。具体的には、VCクラスのノイズフリー分類器学習では、任意の受動的学習アルゴリズムを、すべての非自明なターゲット関数と分布に対して漸近的に厳密に優れたラベル複雑性を持つ能動的学習アルゴリズムに変換できることを証明します。さらに、不一致係数の新たな一般化の観点から、これらの改善の大きさの一般的な特徴付けを提供します。また、これらの結果をラベルノイズの存在下でのアクティブラーニングに拡張し、ノイズ分布の広範なクラスの下でも、通常、パッシブ学習の既知の結果よりも厳密な改善を保証できることを発見しました。

Structured Sparsity via Alternating Direction Methods
交互方向法による構造化されたスパース性

We consider a class of sparse learning problems in high dimensional feature space regularized by a structured sparsity-inducing norm that incorporates prior knowledge of the group structure of the features. Such problems often pose a considerable challenge to optimization algorithms due to the non-smoothness and non-separability of the regularization term. In this paper, we focus on two commonly adopted sparsity-inducing regularization terms, the overlapping Group Lasso penalty l1/l2-norm and the l1/l∞-norm. We propose a unified framework based on the augmented Lagrangian method, under which problems with both types of regularization and their variants can be efficiently solved. As one of the core building-blocks of this framework, we develop new algorithms using a partial-linearization/splitting technique and prove that the accelerated versions of these algorithms require O(1/√ε) iterations to obtain an ε-optimal solution. We compare the performance of these algorithms against that of the alternating direction augmented Lagrangian and FISTA methods on a collection of data sets and apply them to two real-world problems to compare the relative merits of the two norms.

私たちは、特徴のグループ構造に関する事前知識を組み込んだ構造化されたスパース性誘導ノルムによって正則化された高次元特徴空間におけるスパース学習問題のクラスを検討します。このような問題は、正則化項の非滑らかさと非分離性のために、最適化アルゴリズムにとってしばしばかなりの課題となります。この論文では、一般的に採用されている2つのスパース性誘導正則化項、重複グループLassoペナルティl1/l2ノルムとl1/l∞ ノルムに焦点を当てる。私たちは、拡張ラグランジュ法に基づく統一フレームワークを提案します。このフレームワークでは、両方のタイプの正則化とその変種を含む問題を効率的に解決できます。このフレームワークのコアビルディングブロックの1つとして、部分線形化/分割手法を使用する新しいアルゴリズムを開発し、これらのアルゴリズムの高速化バージョンでは ε 最適解を得るためにO(1/√ε)回の反復が必要であることを証明します。これらのアルゴリズムのパフォーマンスを、データセットのコレクションに対する交互方向拡張ラグランジアン法およびFISTA法のパフォーマンスと比較し、2つの実際の問題に適用して、2つの基準の相対的なメリットを比較します。

Feature Selection via Dependence Maximization
依存性最大化による機能選択

We introduce a framework for feature selection based on dependence maximization between the selected features and the labels of an estimation problem, using the Hilbert-Schmidt Independence Criterion. The key idea is that good features should be highly dependent on the labels. Our approach leads to a greedy procedure for feature selection. We show that a number of existing feature selectors are special cases of this framework. Experiments on both artificial and real-world data show that our feature selector works well in practice.

私たちは、選択した特徴と推定問題のラベルとの間の依存性最大化に基づく特徴選択のフレームワークを、ヒルベルト・シュミット独立性基準を使用して導入します。重要な考え方は、優れた機能はラベルに大きく依存する必要があるということです。私たちのアプローチは、機能選択のための貪欲な手順につながります。既存の多くの機能セレクターが、このフレームワークの特殊なケースであることを示します。人工データと実世界のデータの両方での実験は、私たちの特徴セレクターが実際にうまく機能することを示しています。

On Ranking and Generalization Bounds
ランク付けと一般化の境界について

The problem of ranking is to predict or to guess the ordering between objects on the basis of their observed features. In this paper we consider ranking estimators that minimize the empirical convex risk. We prove generalization bounds for the excess risk of such estimators with rates that are faster than 1/√n. We apply our results to commonly used ranking algorithms, for instance boosting or support vector machines. Moreover, we study the performance of considered estimators on real data sets.

ランク付けの問題は、観測された特徴に基づいてオブジェクト間の順序を予測または推測することです。この論文では、経験的凸リスクを最小化する推定量のランク付けについて検討します。このような推定量の過剰リスクの一般化限界を、1/√nよりも速いレートで証明します。私たちは、ブースティングマシンやサポートベクターマシンなど、一般的に使用されるランキングアルゴリズムに結果を適用します。さらに、実際のデータセットでの考慮された推定量のパフォーマンスを研究します。

Transfer in Reinforcement Learning via Shared Features
共有特徴量による強化学習での転移

We present a framework for transfer in reinforcement learning based on the idea that related tasks share some common features, and that transfer can be achieved via those shared features. The framework attempts to capture the notion of tasks that are related but distinct, and provides some insight into when transfer can be usefully applied to a problem sequence and when it cannot. We apply the framework to the knowledge transfer problem, and show that an agent can learn a portable shaping function from experience in a sequence of tasks to significantly improve performance in a later related task, even given a very brief training period. We also apply the framework to skill transfer, to show that agents can learn portable skills across a sequence of tasks that significantly improve performance on later related tasks, approaching the performance of agents given perfectly learned problem-specific skills.

私たちは、関連するタスクがいくつか共通の特徴を共有し、それらの共有された特徴を介して伝達が達成できるという考えに基づいて、強化学習における転移のフレームワークを提示します。このフレームワークは、関連しているが異なるタスクの概念を捉えようとし、問題シーケンスに転送を有効に適用できる場合と適用できない場合についての洞察を提供します。このフレームワークを知識伝達問題に適用し、エージェントが一連のタスクの経験からポータブルシェーピング機能を学習して、非常に短いトレーニング期間であっても、後の関連タスクのパフォーマンスを大幅に向上させることができることを示します。また、このフレームワークをスキル移転に適用し、エージェントが一連のタスク全体でポータブルスキルを学習し、後の関連タスクのパフォーマンスを大幅に向上させ、完全に学習した問題固有のスキルを与えられたエージェントのパフォーマンスに近づくことができることを示します。

Query Strategies for Evading Convex-Inducing Classifiers
凸誘導分類器を回避するためのクエリ戦略

Classifiers are often used to detect miscreant activities. We study how an adversary can systematically query a classifier to elicit information that allows the attacker to evade detection while incurring a near-minimal cost of modifying their intended malfeasance. We generalize the theory of Lowd and Meek (2005) to the family of convex-inducing classifiers that partition their feature space into two sets, one of which is convex. We present query algorithms for this family that construct undetected instances of approximately minimal cost using only polynomially-many queries in the dimension of the space and in the level of approximation. Our results demonstrate that near-optimal evasion can be accomplished for this family without reverse engineering the classifier’s decision boundary. We also consider general lp costs and show that near-optimal evasion on the family of convex-inducing classifiers is generally efficient for both positive and negative convexity for all levels of approximation if p=1.

分類器は、不正行為の検出によく使用されます。私たちは、攻撃者が分類器を体系的にクエリして、攻撃者が検出を回避しながら、意図した不正行為を修正するためのコストをほぼ最小限に抑えることができる情報を聞き出す方法を研究します。LowdとMeek (2005)の理論を、特徴空間を2つのセット(そのうちの1つは凸)に分割する凸誘導分類器のファミリーに一般化します。私たちは、空間の次元と近似レベルで多項式数のクエリのみを使用して、ほぼ最小限のコストで検出されないインスタンスを構築する、このファミリーのクエリアルゴリズムを紹介します。私たちの結果は、分類器の決定境界をリバースエンジニアリングしなくても、このファミリーでほぼ最適な回避を実現できることを示しています。また、一般的なlpコストを考慮し、凸誘導分類器のファミリーでのほぼ最適な回避は、p=1の場合、すべての近似レベルで正と負の凸の両方に対して一般的に効率的であることを示します。

Minimax Manifold Estimation
ミニマックス多様体推定

We find the minimax rate of convergence in Hausdorff distance for estimating a manifold M of dimension d embedded in ℝD given a noisy sample from the manifold. Under certain conditions, we show that the optimal rate of convergence is n-2/(2+d). Thus, the minimax rate depends only on the dimension of the manifold, not on the dimension of the space in which M is embedded.

私たちは、多様体からのノイズの多いサンプルが与えられたときに、RDに埋め込まれた次元dの多様体Mを推定するためのハウスドルフ距離の最小収束率を見つけます。特定の条件下では、最適な収束速度がn-2/(2+d)であることを示します。したがって、ミニマックスレートは多様体の次元にのみ依存し、Mが埋め込まれている空間の次元には依存しません。

A Geometric Approach to Sample Compression
サンプル圧縮への幾何学的アプローチ

The Sample Compression Conjecture of Littlestone & Warmuth has remained unsolved for a quarter century. While maximum classes (concept classes meeting Sauer’s Lemma with equality) can be compressed, the compression of general concept classes reduces to compressing maximal classes (classes that cannot be expanded without increasing VC dimension). Two promising ways forward are: embedding maximal classes into maximum classes with at most a polynomial increase to VC dimension, and compression via operating on geometric representations. This paper presents positive results on the latter approach and a first negative result on the former, through a systematic investigation of finite maximum classes. Simple arrangements of hyperplanes in hyperbolic space are shown to represent maximum classes, generalizing the corresponding Euclidean result. We show that sweeping a generic hyperplane across such arrangements forms an unlabeled compression scheme of size VC dimension and corresponds to a special case of peeling the one-inclusion graph, resolving a recent conjecture of Kuzmin & Warmuth. A bijection between finite maximum classes and certain arrangements of piecewise-linear (PL) hyperplanes in either a ball or Euclidean space is established. Finally we show that d-maximum classes corresponding to PL-hyperplane arrangements in ℝd have cubical complexes homeomorphic to a d-ball, or equivalently complexes that are manifolds with boundary. A main result is that PL arrangements can be swept by a moving hyperplane to unlabeled d-compress any finite maximum class, forming a peeling scheme as conjectured by Kuzmin & Warmuth. A corollary is that some d-maximal classes cannot be embedded into any maximum class of VC-dimension d+k, for any constant k. The construction of the PL sweeping involves Pachner moves on the one-inclusion graph, corresponding to moves of a hyperplane across the intersection of d other hyperplanes. This extends the well known Pachner moves for triangulations to cubical complexes.

LittlestoneとWarmuthのサンプル圧縮予想は、四半世紀もの間未解決のままでした。最大クラス(Sauerの補題を等式で満たす概念クラス)は圧縮できますが、一般概念クラスの圧縮は最大クラス(VC次元を増やさずに拡張できないクラス)の圧縮に帰着します。有望な2つの方法は、VC次元を最大クラスに最大クラスに埋め込む(VC次元を最大で多項式で増やす)方法と、幾何学的表現を操作することで圧縮する方法です。この論文では、有限最大クラスの体系的な調査を通じて、後者のアプローチに関する肯定的な結果と前者に関する最初の否定的な結果を示します。双曲空間における超平面の単純な配置は最大クラスを表すことが示され、対応するユークリッドの結果を一般化します。このような配置にわたって一般的な超平面をスイープすると、サイズVC次元のラベルなし圧縮スキームが形成され、1つの包含グラフを剥がす特殊なケースに対応し、KuzminとWarmuthの最近の予想を解決することを示します。有限最大類と、球またはユークリッド空間内の区分線形(PL)超平面の特定の配置との間の一対一性が確立されています。最後に、ℝd内のPL超平面配置に対応するd最大類には、d球に同相な立方体複体、または境界を持つ多様体である複体があることを示しています。主な結果は、PL配置は移動超平面によって任意の有限最大類のラベルなしd圧縮にスイープでき、Kuzmin & Warmuthによって推測された剥離スキームを形成できることです。系として、任意の定数kに対して、一部のd最大類はVC次元d+kの任意の最大類に埋め込むことができない。PLスイープの構築には、1つの包含グラフ上のPachner移動が含まれ、これはd個の他の超平面の交差点を横切る超平面の移動に対応します。これは、よく知られている三角形分割のPachner移動を立方体複体に拡張したものです。

A Multi-Stage Framework for Dantzig Selector and LASSO
Dantzig SelectorとLASSOのための多段階フレームワーク

We consider the following sparse signal recovery (or feature selection) problem: given a design matrix X∈ ℝn✕ m (m >> n) and a noisy observation vector y∈ ℝn satisfying y=Xβ*+ε where ε is the noise vector following a Gaussian distribution N(0,σ2I), how to recover the signal (or parameter vector) β* when the signal is sparse? The Dantzig selector has been proposed for sparse signal recovery with strong theoretical guarantees. In this paper, we propose a multi-stage Dantzig selector method, which iteratively refines the target signal β*. We show that if X obeys a certain condition, then with a large probability the difference between the solution β̂ estimated by the proposed method and the true solution β* measured in terms of the lp norm (p≥ 1) is bounded as ||β̂-β*||p≤ (C(s-N)1/p√log m+Δ)σ, where C is a constant, s is the number of nonzero entries in β*, the risk of the oracle estimator Δ is independent of m and is much smaller than the first term, and N is the number of entries of β* larger than a certain value in the order of O(σ√log m). The proposed method improves the estimation bound of the standard Dantzig selector approximately from Cs1/p√log mσ to C(s-N)1/p√log mσ where the value N depends on the number of large entries in β*. When N=s, the proposed algorithm achieves the oracle solution with a high probability, where the oracle solution is the projection of the observation vector y onto true features. In addition, with a large probability, the proposed method can select the same number of correct features under a milder condition than the Dantzig selector. Finally, we extend this multi-stage procedure to the LASSO case.

私たちは、次のようなスパース信号回復（または特徴選択）問題を検討します。設計行列X∈ ℝn✕m（m >> n）と、y=Xβ*+ε（εはガウス分布N（0,σ2I）に従うノイズベクトル）を満たすノイズのある観測ベクトルy∈ ℝnが与えられた場合、信号がスパースであるときに信号（またはパラメータベクトル）β*をどのように回復するか？Dantzigセレクタは、強力な理論的保証を備えたスパース信号回復のために提案されています。この論文では、ターゲット信号β*を反復的に改良する多段Dantzigセレクタ法を提案します。Xが特定の条件に従う場合、提案手法で推定された解 β̂ とlpノルム(p≥1)で測定された真の解 β*との差が、高い確率で||β̂-β*||p≤(C(s-N)1/p√log m+Δ)σ で制限されることを示します。ここで、Cは定数、sは β*の非ゼロエントリの数、オラクル推定量 Δ のリスクはmに依存せず最初の項よりもはるかに小さく、NはO(σ√log m)のオーダーで特定の値より大きい β*のエントリの数です。提案手法は、標準Dantzigセレクタの推定境界をCs1/p√log mσ からC(s-N)1/p√log mσ に近似的に改善します。ここで、値Nは β*の大きなエントリの数に依存します。N=sの場合、提案アルゴリズムは高い確率でオラクルソリューションを達成します。ここで、オラクルソリューションは、観測ベクトルyを真の特徴に投影したものです。さらに、高い確率で、提案方法は、Dantzigセレクタよりも緩やかな条件下で、同じ数の正しい特徴を選択できます。最後に、この多段階手順をLASSOのケースに拡張します。

Hope and Fear for Discriminative Training of Statistical Translation Models
統計的翻訳モデルの判別学習に対する希望と恐怖

In machine translation, discriminative models have almost entirely supplanted the classical noisy-channel model, but are standardly trained using a method that is reliable only in low-dimensional spaces. Two strands of research have tried to adapt more scalable discriminative training methods to machine translation: the first uses log-linear probability models and either maximum likelihood or minimum risk, and the other uses linear models and large-margin methods. Here, we provide an overview of the latter. We compare several learning algorithms and describe in detail some novel extensions suited to properties of the translation task: no single correct output, a large space of structured outputs, and slow inference. We present experimental results on a large-scale Arabic-English translation task, demonstrating large gains in translation accuracy.

機械翻訳では、識別モデルが従来のノイズの多いチャネルモデルにほぼ完全に取って代わりましたが、低次元空間でのみ信頼性のある方法を使用して標準的にトレーニングされています。よりスケーラブルな識別学習方法を機械翻訳に適応させようと試みた研究は、1つ目は対数線形確率モデルと最大尤法または最小リスクのいずれかを使用し、もう1つは線形モデルと大マージン法を使用するという2つの研究が行われています。ここでは、後者の概要を説明します。いくつかの学習アルゴリズムを比較し、翻訳タスクの特性に適したいくつかの新しい拡張について詳しく説明します:単一の正しい出力がない、構造化された出力の広いスペース、および推論が遅い。大規模なアラビア語-英語翻訳タスクの実験結果を発表し、翻訳精度が大幅に向上したことを実証します。

Towards Integrative Causal Analysis of Heterogeneous Data Sets and Studies
異種データセットと研究の統合的因果分析に向けて

We present methods able to predict the presence and strength of conditional and unconditional dependencies (correlations) between two variables Y and Z never jointly measured on the same samples, based on multiple data sets measuring a set of common variables. The algorithms are specializations of prior work on learning causal structures from overlapping variable sets. This problem has also been addressed in the field of statistical matching. The proposed methods are applied to a wide range of domains and are shown to accurately predict the presence of thousands of dependencies. Compared against prototypical statistical matching algorithms and within the scope of our experiments, the proposed algorithms make predictions that are better correlated with the sample estimates of the unknown parameters on test data ; this is particularly the case when the number of commonly measured variables is low. The enabling idea behind the methods is to induce one or all causal models that are simultaneously consistent with (fit) all available data sets and prior knowledge and reason with them. This allows constraints stemming from causal assumptions (e.g., Causal Markov Condition, Faithfulness) to propagate. Several methods have been developed based on this idea, for which we propose the unifying name Integrative Causal Analysis (INCA). A contrived example is presented demonstrating the theoretical potential to develop more general methods for co-analyzing heterogeneous data sets. The computational experiments with the novel methods provide evidence that causally-inspired assumptions such as Faithfulness often hold to a good degree of approximation in many real systems and could be exploited for statistical inference. Code, scripts, and data are available at www.mensxmachina.org.

私たちは、共通の変数セットを測定する複数のデータセットに基づいて、同じサンプルで共同測定されたことのない2つの変数YとZ間の条件付きおよび無条件の依存関係(相関)の存在と強度を予測できる方法を紹介します。このアルゴリズムは、重複する変数セットから因果構造を学習する以前の研究を特殊化したものです。この問題は、統計的マッチングの分野でも取り上げられています。提案された方法は、幅広い領域に適用され、何千もの依存関係の存在を正確に予測できることが示されています。プロトタイプの統計的マッチングアルゴリズムと比較し、実験の範囲内で、提案されたアルゴリズムは、テストデータ上の未知のパラメーターのサンプル推定値とより相関性の高い予測を行います。これは、共通して測定される変数の数が少ない場合に特に当てはまります。この方法の背後にある有効なアイデアは、利用可能なすべてのデータセットと事前の知識と同時に一致(適合)し、それらを推論する1つまたはすべての因果モデルを誘導することです。これにより、因果的仮定(因果マルコフ条件、忠実性など)から生じる制約が伝播します。このアイデアに基づいていくつかの方法が開発されており、私たちはこれらを統合的因果分析(INCA)という統一名で提案します。異種データセットを共同分析するためのより一般的な方法を開発する理論的可能性を示す、人為的な例が提示されています。新しい方法を使用した計算実験により、忠実性などの因果的仮定は多くの実際のシステムでかなりの近似値を保持することが多く、統計的推論に利用できるという証拠が提供されます。コード、スクリプト、およびデータはwww.mensxmachina.orgで入手できます。

Analysis of a Random Forests Model
ランダムフォレストモデルの解析

Random forests are a scheme proposed by Leo Breiman in the 2000’s for building a predictor ensemble with a set of decision trees that grow in randomly selected subspaces of data. Despite growing interest and practical use, there has been little exploration of the statistical properties of random forests, and little is known about the mathematical forces driving the algorithm. In this paper, we offer an in-depth analysis of a random forests model suggested by Breiman (2004), which is very close to the original algorithm. We show in particular that the procedure is consistent and adapts to sparsity, in the sense that its rate of convergence depends only on the number of strong features and not on how many noise variables are present.

ランダムフォレストは、2000年代にLeo Breimanによって提案されたスキームで、ランダムに選択されたデータのサブスペースで成長する一連の決定木を使用して予測子アンサンブルを構築します。関心と実用化が高まっているにもかかわらず、ランダムフォレストの統計的特性の調査はほとんど行われておらず、アルゴリズムを駆動する数学的力についてはほとんど知られていません。この論文では、Breiman (2004)によって提案されたランダムフォレストモデルの詳細な分析を提供します。これは、元のアルゴリズムに非常に近いものです。特に、この手順は一貫性があり、収束速度が強い特徴の数にのみ依存し、存在するノイズ変数の数には依存しないという意味で、スパース性に適応することを示します。

The huge Package for High-dimensional Undirected Graph Estimation in R
Rでの高次元無向グラフ推定のための巨大なパッケージ

We describe an R package named huge which provides easy-to-use functions for estimating high dimensional undirected graphs from data. This package implements recent results in the literature, including Friedman et al. (2007), Liu et al. (2009, 2012) and Liu et al. (2010). Compared with the existing graph estimation package glasso, the huge package provides extra features: (1) instead of using Fortan, it is written in C, which makes the code more portable and easier to modify; (2) besides fitting Gaussian graphical models, it also provides functions for fitting high dimensional semiparametric Gaussian copula models; (3) more functions like data-dependent model selection, data generation and graph visualization; (4) a minor convergence problem of the graphical lasso algorithm is corrected; (5) the package allows the user to apply both lossless and lossy screening rules to scale up large-scale problems, making a tradeoff between computational and statistical efficiency.

私たちは、データから高次元の無向グラフを推定するための使いやすい関数を提供するhugeという名前のRパッケージについて説明します。このパッケージは、Friedmanら(2007)、Liuら(2009, 2012)、Liuら(2010)などの文献の最近の結果を実装しています。既存のグラフ推定パッケージglassoと比較して、巨大なパッケージは追加機能を提供します:(1)Fortanを使用する代わりに、Cで書かれているため、コードの移植性が高く、変更が容易になります。(2)ガウスグラフィカルモデルをフィッティングするだけでなく、高次元セミパラメトリックガウシアンコピュラモデルをフィッティングするための関数も提供します。(3)データ依存モデルの選択、データ生成、グラフの視覚化など、より多くの機能。(4)グラフィカルなげなわアルゴリズムのマイナーな収束問題が修正されます。(5)このパッケージにより、ユーザーはロスレスとロスの両方のスクリーニングルールを適用して大規模な問題をスケールアップでき、計算効率と統計効率の間でトレードオフが行われます。

Consistent Model Selection Criteria on High Dimensions
高次元での一貫したモデル選択基準

Asymptotic properties of model selection criteria for high-dimensional regression models are studied where the dimension of covariates is much larger than the sample size. Several sufficient conditions for model selection consistency are provided. Non-Gaussian error distributions are considered and it is shown that the maximal number of covariates for model selection consistency depends on the tail behavior of the error distribution. Also, sufficient conditions for model selection consistency are given when the variance of the noise is neither known nor estimated consistently. Results of simulation studies as well as real data analysis are given to illustrate that finite sample performances of consistent model selection criteria can be quite different.

高次元回帰モデルのモデル選択基準の漸近特性は、共変量の次元がサンプルサイズよりもはるかに大きい場合に研究されます。モデル選択の一貫性を保つためのいくつかの十分な条件が提供されます。非ガウス誤差分布が考慮され、モデル選択一貫性の共変量の最大数は、誤差分布の裾の動作に依存することが示されています。また、ノイズの分散が既知でもなく、一貫して推定されない場合にも、モデル選択の一貫性のための十分な条件が与えられます。シミュレーション研究の結果と実際のデータ分析の結果は、一貫したモデル選択基準の有限サンプル性能がかなり異なる可能性があることを示しています。

Positive Semidefinite Metric Learning Using Boosting-like Algorithms
ブースティングライクアルゴリズムを用いた正の半定値計量学習

The success of many machine learning and pattern recognition methods relies heavily upon the identification of an appropriate distance metric on the input data. It is often beneficial to learn such a metric from the input training data, instead of using a default one such as the Euclidean distance. In this work, we propose a boosting-based technique, termed BOOSTMETRIC, for learning a quadratic Mahalanobis distance metric. Learning a valid Mahalanobis distance metric requires enforcing the constraint that the matrix parameter to the metric remains positive semidefinite. Semidefinite programming is often used to enforce this constraint, but does not scale well and is not easy to implement. BOOSTMETRIC is instead based on the observation that any positive semidefinite matrix can be decomposed into a linear combination of trace-one rank-one matrices. BOOSTMETRIC thus uses rank-one positive semidefinite matrices as weak learners within an efficient and scalable boosting-based learning process. The resulting methods are easy to implement, efficient, and can accommodate various types of constraints. We extend traditional boosting algorithms in that its weak learner is a positive semidefinite matrix with trace and rank being one rather than a classifier or regressor. Experiments on various data sets demonstrate that the proposed algorithms compare favorably to those state-of-the-art methods in terms of classification accuracy and running time.

多くの機械学習およびパターン認識法の成功は、入力データに対する適切な距離メトリックの識別に大きく依存しています。ユークリッド距離などのデフォルトのメトリックを使用する代わりに、入力トレーニングデータからそのようなメトリックを学習する方が有益な場合がよくあります。この研究では、2次マハラノビス距離メトリックを学習するためのブースティングベースの手法(BOOSTMETRICと名付けました)を提案します。有効なマハラノビス距離メトリックを学習するには、メトリックの行列パラメータが正の半正定値のままであるという制約を適用する必要があります。この制約を適用するために半正定値計画法がよく使用されますが、スケーラビリティが低く、実装が容易ではありません。代わりに、BOOSTMETRICは、任意の正の半正定値行列はトレース1ランク1行列の線形結合に分解できるという観察に基づいています。したがって、BOOSTMETRICは、効率的でスケーラブルなブースティングベースの学習プロセス内で、ランク1正の半正定値行列を弱学習器として使用します。結果として得られる方法は実装が容易で、効率的であり、さまざまな種類の制約に対応できます。私たちは、従来のブースティングアルゴリズムを拡張し、その弱学習器を分類器や回帰器ではなく、トレースとランクが1である正の半定値行列にします。さまざまなデータセットでの実験により、提案されたアルゴリズムは、分類精度と実行時間の点で最先端の方法に匹敵することが実証されています。

Sampling Methods for the Nyström Method
ナイストロム法のサンプリング法

The Nyström method is an efficient technique to generate low-rank matrix approximations and is used in several large-scale learning applications. A key aspect of this method is the procedure according to which columns are sampled from the original matrix. In this work, we explore the efficacy of a variety of fixed and adaptive sampling schemes. We also propose a family of ensemble-based sampling algorithms for the Nyström method. We report results of extensive experiments that provide a detailed comparison of various fixed and adaptive sampling techniques, and demonstrate the performance improvement associated with the ensemble Nyström method when used in conjunction with either fixed or adaptive sampling schemes. Corroborating these empirical findings, we present a theoretical analysis of the Nyström method, providing novel error bounds guaranteeing a better convergence rate of the ensemble Nyström method in comparison to the standard Nyström method.

Nyström法は、低ランクの行列近似を生成する効率的な手法であり、いくつかの大規模な学習アプリケーションで使用されています。この方法の重要な側面は、元の行列から列をサンプリングする手順です。この研究では、さまざまな固定サンプリングスキームと適応サンプリングスキームの有効性を探ります。また、ナイストロム法のためのアンサンブルベースのサンプリングアルゴリズムのファミリーを提案します。さまざまな固定サンプリング手法と適応サンプリング手法の詳細な比較を提供する広範な実験の結果を報告し、固定サンプリングスキームまたは適応サンプリングスキームと組み合わせて使用した場合のアンサンブルNyström法に関連するパフォーマンスの向上を実証します。これらの経験的知見を裏付けるために、ナイストロム法の理論的分析を提示し、標準的なナイストロム法と比較してアンサンブルナイストロム法のより良い収束率を保証する新しい誤差範囲を提供します。

Mal-ID: Automatic Malware Detection Using Common Segment Analysis and Meta-Features
Mal-ID:一般的なセグメント分析とメタ機能を使用した自動マルウェア検出

This paper proposes several novel methods, based on machine learning, to detect malware in executable files without any need for preprocessing, such as unpacking or disassembling. The basic method (Mal-ID) is a new static (form-based) analysis methodology that uses common segment analysis in order to detect malware files. By using common segment analysis, Mal-ID is able to discard malware parts that originate from benign code. In addition, Mal-ID uses a new kind of feature, termed meta-feature, to better capture the properties of the analyzed segments. Rather than using the entire file, as is usually the case with machine learning based techniques, the new approach detects malware on the segment level. This study also introduces two Mal-ID extensions that improve the Mal-ID basic method in various aspects. We rigorously evaluated Mal-ID and its two extensions with more than ten performance measures, and compared them to the highly rated boosted decision tree method under identical settings. The evaluation demonstrated that Mal-ID and the two Mal-ID extensions outperformed the boosted decision tree method in almost all respects. In addition, the results indicated that by extracting meaningful features, it is sufficient to employ one simple detection rule for classifying executable files.

この論文では、解凍や逆アセンブルなどの前処理を必要とせずに実行可能ファイル内のマルウェアを検出する、機械学習に基づくいくつかの新しい方法を提案します。基本方法(Mal-ID)は、共通セグメント分析を使用してマルウェアファイルを検出する新しい静的(フォームベース)分析方法です。共通セグメント分析を使用することで、Mal-IDは無害なコードに由来するマルウェア部分を破棄できます。さらに、Mal-IDは、メタ機能と呼ばれる新しい種類の機能を使用して、分析されたセグメントの特性をより適切にキャプチャします。機械学習ベースの手法では通常ファイル全体を使用するのに対し、この新しいアプローチではセグメントレベルでマルウェアを検出します。この研究では、Mal-ID基本方法をさまざまな面で改善する2つのMal-ID拡張機能も紹介します。Mal-IDとその2つの拡張機能を10を超えるパフォーマンス測定で厳密に評価し、同じ設定で評価の高いブースト決定木方法と比較しました。評価により、Mal-IDと2つのMal-ID拡張機能は、ほぼすべての点でブースト決定木方法よりも優れていることが示されました。さらに、意味のある特徴を抽出することで、実行可能ファイルを分類するための1つの単純な検出ルールを採用するだけで十分であることが示されました。

Stability of Density-Based Clustering
密度ベースクラスタリングの安定性

High density clusters can be characterized by the connected components of a level set L(λ) = {x: p(x)>λ} of the underlying probability density function p generating the data, at some appropriate level λ ≥ 0. The complete hierarchical clustering can be characterized by a cluster tree T= ∪λL(λ). In this paper, we study the behavior of a density level set estimate L̂(λ) and cluster tree estimate T̂ based on a kernel density estimator with kernel bandwidth h. We define two notions of instability to measure the variability of L̂(λ) and T̂ as a function of h, and investigate the theoretical properties of these instability measures.

高密度クラスターは、データを生成する基になる確率密度関数pのレベルセットL(λ)= {x:p(x)>λ}の接続されたコンポーネントによって、適切なレベルλ≥0で特徴付けることができます。完全な階層クラスタリングは、クラスターツリーT=∪λL(λ)によって特徴付けることができます。この論文では、カーネル帯域幅hを持つカーネル密度推定器に基づく密度レベル集合推定L̂(λ)とクラスタツリー推定T̂の振る舞いについて検討します。L̂(λ)とT̂の変動性をhの関数として測定するために、不安定性の2つの概念を定義し、これらの不安定性測定の理論的特性を調査します。

Algebraic Geometric Comparison of Probability Distributions
確率分布の代数的幾何学的比較

We propose a novel algebraic algorithmic framework for dealing with probability distributions represented by their cumulants such as the mean and covariance matrix. As an example, we consider the unsupervised learning problem of finding the subspace on which several probability distributions agree. Instead of minimizing an objective function involving the estimated cumulants, we show that by treating the cumulants as elements of the polynomial ring we can directly solve the problem, at a lower computational cost and with higher accuracy. Moreover, the algebraic viewpoint on probability distributions allows us to invoke the theory of algebraic geometry, which we demonstrate in a compact proof for an identifiability criterion.

私たちは、平均行列や共分散行列などのキュムラントによって表される確率分布を扱うための新しい代数的アルゴリズムフレームワークを提案します。例として、いくつかの確率分布が一致する部分空間を見つける教師なし学習の問題を考えます。推定されたキュムラントを含む目的関数を最小化する代わりに、キュムラントを多項式リングの要素として扱うことで、問題を直接解くことができ、計算コストが低く、精度が高いことを示します。さらに、確率分布に関する代数的視点は、代数幾何学の理論を呼び出すことを可能にし、それを同定可能性基準のコンパクトな証明で実証します。

NIMFA : A Python Library for Nonnegative Matrix Factorization
NIMFA : 非負行列因数分解のためのPythonライブラリ

NIMFA is an open-source Python library that provides a unified interface to nonnegative matrix factorization algorithms. It includes implementations of state-of-the-art factorization methods, initialization approaches, and quality scoring. It supports both dense and sparse matrix representation. NIMFA’s component-based implementation and hierarchical design should help the users to employ already implemented techniques or design and code new strategies for matrix factorization tasks.

NIMFAは、非負行列因数分解アルゴリズムへの統一インターフェイスを提供するオープンソースのPythonライブラリです。これには、最先端の因数分解法、初期化アプローチ、および品質スコアリングの実装が含まれます。密行列表現と疎行列表現の両方をサポートします。NIMFAのコンポーネントベースの実装と階層設計は、ユーザーが既に実装されている手法を使用したり、行列分解タスクの新しい戦略を設計およびコーディングしたりするのに役立つはずです。

Causal Bounds and Observable Constraints for Non-deterministic Models
非決定論的モデルの因果的境界と観測可能制約

Conditional independence relations involving latent variables do not necessarily imply observable independences. They may imply inequality constraints on observable parameters and causal bounds, which can be used for falsification and identification. The literature on computing such constraints often involve a deterministic underlying data generating process in a counterfactual framework. If an analyst is ignorant of the nature of the underlying mechanisms then they may wish to use a model which allows the underlying mechanisms to be probabilistic. A method of computation for a weaker model without any determinism is given here and demonstrated for the instrumental variable model, though applicable to other models. The approach is based on the analysis of mappings with convex polytopes in a decision theoretic framework and can be implemented in readily available polyhedral computation software. Well known constraints and bounds are replicated in a probabilistic model and novel ones are computed for instrumental variable models without non-deterministic versions of the randomization, exclusion restriction and monotonicity assumptions respectively.

潜在変数を含む条件付き独立関係は、必ずしも観測可能な独立性を意味するわけではありません。観測可能なパラメータと因果境界に対する不等式制約を意味する場合があり、これは反証や識別に使用できます。このような制約の計算に関する文献には、反事実的フレームワークにおける決定論的な基礎データ生成プロセスが含まれることがよくあります。分析者が基礎となるメカニズムの性質を知らない場合は、基礎となるメカニズムが確率的であることを認めるモデルを使用することをお勧めします。ここでは、決定論のない弱いモデルの計算方法を示し、他のモデルにも適用できますが、操作変数モデルについて実証します。このアプローチは、決定理論フレームワークにおける凸多面体とのマッピングの分析に基づいており、すぐに入手できる多面体計算ソフトウェアで実装できます。よく知られている制約と境界は確率モデルで複製され、新しい制約と境界は、ランダム化、排除制限、単調性仮定の非決定論的バージョンをそれぞれ使用せずに操作変数モデルに対して計算されます。

Algorithms for Learning Kernels Based on Centered Alignment
中央揃えに基づくカーネル学習アルゴリズム

This paper presents new and effective algorithms for learning kernels. In particular, as shown by our empirical results, these algorithms consistently outperform the so-called uniform combination solution that has proven to be difficult to improve upon in the past, as well as other algorithms for learning kernels based on convex combinations of base kernels in both classification and regression. Our algorithms are based on the notion of centered alignment which is used as a similarity measure between kernels or kernel matrices. We present a number of novel algorithmic, theoretical, and empirical results for learning kernels based on our notion of centered alignment. In particular, we describe efficient algorithms for learning a maximum alignment kernel by showing that the problem can be reduced to a simple QP and discuss a one-stage algorithm for learning both a kernel and a hypothesis based on that kernel using an alignment-based regularization. Our theoretical results include a novel concentration bound for centered alignment between kernel matrices, the proof of the existence of effective predictors for kernels with high alignment, both for classification and for regression, and the proof of stability-based generalization bounds for a broad family of algorithms for learning kernels based on centered alignment. We also report the results of experiments with our centered alignment-based algorithms in both classification and regression.

この論文では、カーネルを学習するための新しい効果的なアルゴリズムを紹介します。特に、実験結果が示すように、これらのアルゴリズムは、これまで改善が困難であることが証明されているいわゆる均一な組み合わせソリューションや、分類と回帰の両方でベースカーネルの凸結合に基づくカーネル学習の他のアルゴリズムよりも一貫して優れています。私たちのアルゴリズムは、カーネルまたはカーネル行列間の類似性尺度として使用される中心アラインメントの概念に基づいています。中心アラインメントの概念に基づくカーネル学習に関する、いくつかの新しいアルゴリズム的、理論的、および実験的結果を紹介します。特に、問題を単純なQPに縮小できることを示して最大アラインメントカーネルを学習するための効率的なアルゴリズムについて説明し、アラインメントベースの正則化を使用してカーネルとそのカーネルに基づく仮説の両方を学習するための1段階アルゴリズムについて説明します。私たちの理論的結果には、カーネル行列間の中心アラインメントの新しい集中境界、分類と回帰の両方において高いアラインメントを持つカーネルの有効な予測子の存在の証明、および中心アラインメントに基づいてカーネルを学習する広範なアルゴリズムファミリの安定性に基づく一般化境界の証明が含まれています。また、分類と回帰の両方での中心アラインメントベースのアルゴリズムを使用した実験の結果も報告します。

Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso
大規模グラフィカルラッソのための接続されたコンポーネントへの正確な共分散閾値化

We consider the sparse inverse covariance regularization problem or graphical lasso with regularization parameter λ. Suppose the sample covariance graph formed by thresholding the entries of the sample covariance matrix at λ is decomposed into connected components. We show that the vertex-partition induced by the connected components of the thresholded sample covariance graph (at λ) is exactly equal to that induced by the connected components of the estimated concentration graph, obtained by solving the graphical lasso problem for the same λ. This characterizes a very interesting property of a path of graphical lasso solutions. Furthermore, this simple rule, when used as a wrapper around existing algorithms for the graphical lasso, leads to enormous performance gains. For a range of values of λ, our proposal splits a large graphical lasso problem into smaller tractable problems, making it possible to solve an otherwise infeasible large-scale problem. We illustrate the graceful scalability of our proposal via synthetic and real-life microarray examples.

私たちは、正規化パラメータ λ を持つスパース逆共分散正規化問題、またはグラフィカルLassoについて考察します。サンプル共分散行列のエントリを λ でしきい値処理して形成されたサンプル共分散グラフが、連結成分に分解されると仮定します。しきい値処理されたサンプル共分散グラフ(λ で)の連結成分によって誘導される頂点パーティションは、同じ λ のグラフィカルLasso問題を解くことによって得られる推定濃度グラフの連結成分によって誘導される頂点パーティションとまったく同じであることを示します。これは、グラフィカルLassoソリューションのパスの非常に興味深い特性を特徴付けます。さらに、この単純なルールは、グラフィカルLassoの既存のアルゴリズムのラッパーとして使用すると、パフォーマンスが大幅に向上します。λ の値の範囲に対して、我々の提案は、大規模なグラフィカルLasso問題をより小さく扱いやすい問題に分割し、そうでなければ実行不可能な大規模な問題を解決できるようにします。私たちは、合成および実際のマイクロアレイの例を使用して、我々の提案の優れたスケーラビリティを示します。

GPLP: A Local and Parallel Computation Toolbox for Gaussian Process Regression
GPLP:ガウス過程回帰のためのローカルおよび並列計算ツールボックス

This paper presents the Getting-started style documentation for the local and parallel computation toolbox for Gaussian process regression (GPLP), an open source software package written in Matlab (but also compatible with Octave). The working environment and the usage of the software package will be presented in this paper.

この論文では、Matlabで記述されたオープンソースソフトウェアパッケージであるガウス過程回帰(GPLP)のローカルおよび並列計算ツールボックスの入門スタイルのドキュメントを示します(ただし、Octaveとも互換性があります)。この論文では、作業環境とソフトウェアパッケージの使用方法について説明します。

A Kernel Two-Sample Test
カーネルの 2 サンプル・テスト

We propose a framework for analyzing and comparing distributions, which we use to construct statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS), and is called the maximum mean discrepancy (MMD). We present two distribution-free tests based on large deviation bounds for the MMD, and a third test based on the asymptotic distribution of this statistic. The MMD can be computed in quadratic time, although efficient linear time approximations are available. Our statistic is an instance of an integral probability metric, and various classical metrics on distributions are obtained when alternative function classes are used in place of an RKHS. We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests.

私たちは、分布を分析および比較するためのフレームワークを提案し、これを使用して2つのサンプルが異なる分布から抽出されたかどうかを判断する統計的検定を構築します。検定統計量は、再生核ヒルベルト空間(RKHS)の単位球における関数の期待値の最大差であり、最大平均不一致(MMD)と呼ばれます。MMDの大偏差境界に基づく2つの分布フリー検定と、この統計量の漸近分布に基づく3番目の検定を示します。MMDは2次時間で計算できますが、効率的な線形時間近似も利用できます。我々の統計量は積分確率メトリックのインスタンスであり、RKHSの代わりに代替関数クラスを使用すると、分布に関するさまざまな古典的なメトリックが得られます。私たちは、2サンプル検定を、ハンガリー結婚法を使用したデータベースの属性マッチングなど、さまざまな問題に適用し、優れたパフォーマンスを発揮します。グラフ上の分布を比較する場合にも優れたパフォーマンスが得られますが、これはそのような検定としては初めてのものです。

A Case Study on Meta-Generalising: A Gaussian Processes Approach
メタジェネラライゼーションに関する事例研究:ガウス過程アプローチ

We propose a novel model for meta-generalisation, that is, performing prediction on novel tasks based on information from multiple different but related tasks. The model is based on two coupled Gaussian processes with structured covariance function; one model performs predictions by learning a constrained covariance function encapsulating the relations between the various training tasks, while the second model determines the similarity of new tasks to previously seen tasks. We demonstrate empirically on several real and synthetic data sets both the strengths of the approach and its limitations due to the distributional assumptions underpinning it.

私たちは、メタジェネラライゼーション、つまり、複数の異なるが関連するタスクからの情報に基づいて新しいタスクの予測を実行するための新しいモデルを提案します。このモデルは、構造化された共分散関数を持つ2つの結合ガウス過程に基づいています。1つのモデルは、さまざまな学習タスク間の関係をカプセル化する制約付き共分散関数を学習することで予測を実行し、2つ目のモデルは、新しいタスクと以前に見たタスクの類似性を判断します。私たちは、いくつかの実際のデータセットと合成データセットで、アプローチの長所と、それを支える分布の仮定によるその制限の両方を経験的に示しています。

Structured Sparsity and Generalization
構造化されたスパース性と一般化

We present a data dependent generalization bound for a large class of regularized algorithms which implement structured sparsity constraints. The bound can be applied to standard squared-norm regularization, the Lasso, the group Lasso, some versions of the group Lasso with overlapping groups, multiple kernel learning and other regularization schemes. In all these cases competitive results are obtained. A novel feature of our bound is that it can be applied in an infinite dimensional setting such as the Lasso in a separable Hilbert space or multiple kernel learning with a countable number of kernels.

私たちは、構造化されたスパース制約を実装する正規化アルゴリズムの大規模なクラスにバインドされたデータ依存の汎化を示します。この制限は、標準の二乗ノルム正則化、なげなわ、グループなげなわ、グループが重複するグループなわなわの一部のバージョン、複数のカーネル学習、およびその他の正則化スキームに適用できます。これらすべての場合において、競争力のある結果が得られます。私たちの境界の斬新な特徴は、分離可能なヒルベルト空間のなげなわや、可算数のカーネルを持つ複数のカーネル学習など、無限次元の設定に適用できることです。

Learning Algorithms for the Classification Restricted Boltzmann Machine
分類制限付きボルツマンマシンの学習アルゴリズム

Recent developments have demonstrated the capacity of restricted Boltzmann machines (RBM) to be powerful generative models, able to extract useful features from input data or construct deep artificial neural networks. In such settings, the RBM only yields a preprocessing or an initialization for some other model, instead of acting as a complete supervised model in its own right. In this paper, we argue that RBMs can provide a self-contained framework for developing competitive classifiers. We study the Classification RBM (ClassRBM), a variant on the RBM adapted to the classification setting. We study different strategies for training the ClassRBM and show that competitive classification performances can be reached when appropriately combining discriminative and generative training objectives. Since training according to the generative objective requires the computation of a generally intractable gradient, we also compare different approaches to estimating this gradient and address the issue of obtaining such a gradient for problems with very high dimensional inputs. Finally, we describe how to adapt the ClassRBM to two special cases of classification problems, namely semi-supervised and multitask learning.

最近の開発により、制限付きボルツマンマシン(RBM)が強力な生成モデルとなり、入力データから有用な特徴を抽出したり、人工ニューラルネットワークを構築したりできることが実証されています。このような設定では、RBMは完全な教師ありモデルとして機能するのではなく、他のモデルの前処理または初期化のみを行います。この論文では、RBMが競争力のある分類器を開発するための自己完結型のフレームワークを提供できると主張します。分類設定に適応したRBMのバリエーションである分類RBM (ClassRBM)について検討します。ClassRBMをトレーニングするためのさまざまな戦略を検討し、識別トレーニング目標と生成トレーニング目標を適切に組み合わせると、競争力のある分類パフォーマンスを達成できることを示します。生成目標に従ったトレーニングでは、一般的に扱いにくい勾配を計算する必要があるため、この勾配を推定するさまざまなアプローチを比較し、非常に高次元の入力を伴う問題でそのような勾配を取得する問題にも対処します。最後に、分類問題の2つの特殊なケース、つまり半教師あり学習とマルチタスク学習にClassRBMを適応させる方法について説明します。

Non-Sparse Multiple Kernel Fisher Discriminant Analysis
非スパース多重カーネルフィッシャー判別分析

Sparsity-inducing multiple kernel Fisher discriminant analysis (MK-FDA) has been studied in the literature. Building on recent advances in non-sparse multiple kernel learning (MKL), we propose a non-sparse version of MK-FDA, which imposes a general lp norm regularisation on the kernel weights. We formulate the associated optimisation problem as a semi-infinite program (SIP), and adapt an iterative wrapper algorithm to solve it. We then discuss, in light of latest advances in MKL optimisation techniques, several reformulations and optimisation strategies that can potentially lead to significant improvements in the efficiency and scalability of MK-FDA. We carry out extensive experiments on six datasets from various application areas, and compare closely the performance of lp MK-FDA, fixed norm MK-FDA, and several variants of SVM-based MKL (MK-SVM). Our results demonstrate that lp MK-FDA improves upon sparse MK-FDA in many practical situations. The results also show that on image categorisation problems, lp MK-FDA tends to outperform its SVM counterpart. Finally, we also discuss the connection between (MK-)FDA and (MK-)SVM, under the unified framework of regularised kernel machines.

スパース性誘導多重カーネルフィッシャー判別分析(MK-FDA)は文献で研究されてきました。非スパース多重カーネル学習(MKL)の最近の進歩に基づいて、カーネルの重みに一般的なlpノルム正則化を課すMK-FDAの非スパースバージョンを提案します。関連する最適化問題を半無限プログラム(SIP)として定式化し、反復ラッパーアルゴリズムを適用してこれを解決します。次に、MKL最適化手法の最新の進歩を考慮して、MK-FDAの効率とスケーラビリティを大幅に向上させる可能性のあるいくつかの再定式化と最適化戦略について説明します。さまざまなアプリケーション領域の6つのデータセットで広範な実験を行い、lp MK-FDA、固定ノルムMK-FDA、およびSVMベースのMKL (MK-SVM)のいくつかのバリアントのパフォーマンスを綿密に比較します。結果は、多くの実用的な状況でlp MK-FDAがスパースMK-FDAよりも優れていることを示しています。結果はまた、画像分類問題では、lp MK-FDAがSVMよりも優れている傾向があることを示しています。最後に、正規化カーネルマシンの統一されたフレームワークの下で、(MK-)FDAと(MK-)SVMの関係についても説明します。

A Primal-Dual Convergence Analysis of Boosting
ブースティングの主-双対収束解析

Boosting combines weak learners into a predictor with low empirical risk. Its dual constructs a high entropy distribution upon which weak learners and training labels are uncorrelated. This manuscript studies this primal-dual relationship under a broad family of losses, including the exponential loss of AdaBoost and the logistic loss, revealing: • Weak learnability aids the whole loss family: for any ε > 0, O(ln(1/ε)) iterations suffice to produce a predictor with empirical risk ε-close to the infimum; • The circumstances granting the existence of an empirical risk minimizer may be characterized in terms of the primal and dual problems, yielding a new proof of the known rate O(ln(1/ε)); • Arbitrary instances may be decomposed into the above two, granting rate O(1/ε), with a matching lower bound provided for the logistic loss.

ブースティングは、弱い学習器を経験的リスクの低い予測子に結合します。その双対性は、弱い学習器と学習ラベルが相関しない高いエントロピー分布を構築します。この原稿では、AdaBoostの指数関数的損失とロジスティック損失を含む、広範な損失ファミリーの下でこの原始双対関係を研究し、次のことを明らかにします:•弱い学習可能性は、損失ファミリー全体を支援します:任意のε>0について、O(ln(1/ε))の反復は、経験的リスクがεinfimumに近い予測子を生成するのに十分です。• 経験的リスク最小化装置の存在を認める状況は、主問題と双対問題の観点から特徴付けられ、既知のレートO(ln(1/ε))の新しい証明をもたらす可能性があります。• 任意のインスタンスを上記の2つに分解でき、付与レートO(1/ε)と、ロジスティック損失に一致する下限が提供されます。

ML-Flex: A Flexible Toolbox for Performing Classification Analyses In Parallel
ML-Flex: 分類解析を並行して実行するための柔軟なツールボックス

Motivated by a need to classify high-dimensional, heterogeneous data from the bioinformatics domain, we developed ML-Flex, a machine-learning toolbox that enables users to perform two-class and multi-class classification analyses in a systematic yet flexible manner. ML-Flex was written in Java but is capable of interfacing with third-party packages written in other programming languages. It can handle multiple input-data formats and supports a variety of customizations. ML-Flex provides implementations of various validation strategies, which can be executed in parallel across multiple computing cores, processors, and nodes. Additionally, ML-Flex supports aggregating evidence across multiple algorithms and data sets via ensemble learning. This open-source software package is freely available from http://mlflex.sourceforge.net.

バイオインフォマティクス領域から高次元の異種データを分類する必要性に動機付けられ、ユーザーが体系的かつ柔軟な方法で2クラスおよびマルチクラスの分類分析を実行できる機械学習ツールボックスであるML-Flexを開発しました。ML-FlexはJavaで記述されていますが、他のプログラミング言語で記述されたサードパーティパッケージとやり取りすることができます。複数の入力データ形式を処理でき、さまざまなカスタマイズをサポートします。ML-Flexは、さまざまな検証戦略の実装を提供し、複数のコンピューティングコア、プロセッサ、ノード間で並列に実行できます。さらに、ML-Flexは、アンサンブル学習により、複数のアルゴリズムとデータセット間で証拠を集約することをサポートしています。このオープンソースソフトウェアパッケージは、http://mlflex.sourceforge.netから無料で入手できます。

MULTIBOOST: A Multi-purpose Boosting Package
MULTIBOOST:多目的ブースティングパッケージ

The MULTIBOOST package provides a fast C++ implementation of multi-class/multi-label/multi-task boosting algorithms. It is based on ADABOOST.MH but it also implements popular cascade classifiers and FILTERBOOST. The package contains common multi-class base learners (stumps, trees, products, Haar filters). Further base learners and strong learners following the boosting paradigm can be easily implemented in a flexible framework.

MULTIBOOSTパッケージは、マルチクラス/マルチラベル/マルチタスクブースティングアルゴリズムの高速なC++実装を提供します。ADABOOSTに基づいています。MHですが、一般的なカスケード分類器とFILTERBOOSTも実装しています。このパッケージには、一般的なマルチクラス基本学習器(切り株、樹木、製品、ハールフィルター)が含まれています。ブースティングパラダイムに従ったさらなる基本学習器と強力な学習器は、柔軟なフレームワークで簡単に実装できます。

Metric and Kernel Learning Using a Linear Transformation
線形変換を使用したメトリックとカーネルの学習

Metric and kernel learning arise in several machine learning applications. However, most existing metric learning algorithms are limited to learning metrics over low-dimensional data, while existing kernel learning algorithms are often limited to the transductive setting and do not generalize to new data points. In this paper, we study the connections between metric learning and kernel learning that arise when studying metric learning as a linear transformation learning problem. In particular, we propose a general optimization framework for learning metrics via linear transformations, and analyze in detail a special case of our framework—that of minimizing the LogDet divergence subject to linear constraints. We then propose a general regularized framework for learning a kernel matrix, and show it to be equivalent to our metric learning framework. Our theoretical connections between metric and kernel learning have two main consequences: 1) the learned kernel matrix parameterizes a linear transformation kernel function and can be applied inductively to new data points, 2) our result yields a constructive method for kernelizing most existing Mahalanobis metric learning formulations. We demonstrate our learning approach by applying it to large-scale real world problems in computer vision, text mining and semi-supervised kernel dimensionality reduction.

メトリック学習とカーネル学習は、いくつかの機械学習アプリケーションで発生します。ただし、既存のメトリック学習アルゴリズムのほとんどは、低次元データに対するメトリック学習に限定されています。一方、既存のカーネル学習アルゴリズムは、多くの場合、トランスダクティブ設定に限定されており、新しいデータポイントに一般化されません。この論文では、メトリック学習を線形変換学習問題として研究する際に生じる、メトリック学習とカーネル学習の関係について検討します。特に、線形変換を介してメトリックを学習するための一般的な最適化フレームワークを提案し、線形制約の下でLogDetダイバージェンスを最小化するフレームワークの特殊なケースを詳細に分析します。次に、カーネルマトリックスを学習するための一般的な正規化フレームワークを提案し、それがメトリック学習フレームワークと同等であることを示します。メトリック学習とカーネル学習の理論的な関係には、2つの主な結果があります。1)学習されたカーネルマトリックスは線形変換カーネル関数をパラメーター化し、新しいデータポイントに帰納的に適用できます。2)結果から、既存のほとんどのマハラノビスメトリック学習定式化をカーネル化するための構成的な方法が得られます。私たちは、コンピュータービジョン、テキストマイニング、半教師ありカーネル次元削減における大規模な現実世界の問題に学習アプローチを適用することで、その効果を実証します。

Eliminating Spammers and Ranking Annotators for Crowdsourced Labeling Tasks
スパマーを排除し、クラウドソーシングによるラベリングタスクのアノテーターをランク付けする

With the advent of crowdsourcing services it has become quite cheap and reasonably effective to get a data set labeled by multiple annotators in a short amount of time. Various methods have been proposed to estimate the consensus labels by correcting for the bias of annotators with different kinds of expertise. Since we do not have control over the quality of the annotators, very often the annotations can be dominated by spammers, defined as annotators who assign labels randomly without actually looking at the instance. Spammers can make the cost of acquiring labels very expensive and can potentially degrade the quality of the final consensus labels. In this paper we propose an empirical Bayesian algorithm called SpEM that iteratively eliminates the spammers and estimates the consensus labels based only on the good annotators. The algorithm is motivated by defining a spammer score that can be used to rank the annotators. Experiments on simulated and real data show that the proposed approach is better than (or as good as) the earlier approaches in terms of the accuracy and uses a significantly smaller number of annotators.

クラウドソーシングサービスの出現により、複数の注釈者が短時間でデータセットにラベルを付ける作業が、非常に安価で、かなり効率的になりました。専門知識の異なる注釈者の偏りを補正することで、コンセンサスラベルを推定するさまざまな方法が提案されています。注釈者の質を制御できないため、注釈はスパマーによって占められることが多くなります。スパマーとは、インスタンスを実際に確認せずにラベルをランダムに割り当てる注釈者と定義されます。スパマーは、ラベルの取得コストを非常に高くし、最終的なコンセンサスラベルの品質を低下させる可能性があります。この論文では、スパマーを繰り返し排除し、優れた注釈者のみに基づいてコンセンサスラベルを推定する、SpEMと呼ばれる経験的ベイズアルゴリズムを提案します。このアルゴリズムは、注釈者のランク付けに使用できるスパマースコアを定義することを目的としています。シミュレーションと実際のデータでの実験では、提案されたアプローチは精度の点で以前のアプローチよりも優れており（または同等であり）、使用するアノテーターの数が大幅に少ないことが示されています。

Multi-Assignment Clustering for Boolean Data
ブールデータの複数代入クラスタリング

We propose a probabilistic model for clustering Boolean data where an object can be simultaneously assigned to multiple clusters. By explicitly modeling the underlying generative process that combines the individual source emissions, highly structured data are expressed with substantially fewer clusters compared to single-assignment clustering. As a consequence, such a model provides robust parameter estimators even when the number of samples is low. We extend the model with different noise processes and demonstrate that maximum-likelihood estimation with multiple assignments consistently infers source parameters more accurately than single-assignment clustering. Our model is primarily motivated by the task of role mining for role-based access control, where users of a system are assigned one or more roles. In experiments with real-world access-control data, our model exhibits better generalization performance than state-of-the-art approaches.

私たちは、オブジェクトを複数のクラスターに同時に割り当てることができるブールデータをクラスタリングするための確率モデルを提案します。個々の発生源の排出量を組み合わせる基礎となる生成プロセスを明示的にモデル化することにより、高度に構造化されたデータは、単一割り当てクラスタリングと比較して大幅に少ないクラスターで表現されます。結果として、このようなモデルは、サンプル数が少ない場合でもロバストなパラメーター推定量を提供します。さまざまなノイズプロセスでモデルを拡張し、複数の割り当てによる最尤推定が、単一割り当てクラスタリングよりも一貫してソースパラメーターを推論することを示しています。私たちのモデルは、主に、システムのユーザーに1つ以上のロールが割り当てられるロールベースのアクセス制御のためのロールマイニングのタスクによって動機付けられています。実際のアクセス制御データを使用した実験では、私たちのモデルは最先端のアプローチよりも優れた汎化パフォーマンスを示します。

Online Learning in the Embedded Manifold of Low-rank Matrices
低ランク行列の組み込み多様体におけるオンライン学習

When learning models that are represented in matrix forms, enforcing a low-rank constraint can dramatically improve the memory and run time complexity, while providing a natural regularization of the model. However, naive approaches to minimizing functions over the set of low-rank matrices are either prohibitively time consuming (repeated singular value decomposition of the matrix) or numerically unstable (optimizing a factored representation of the low-rank matrix). We build on recent advances in optimization over manifolds, and describe an iterative online learning procedure, consisting of a gradient step, followed by a second-order retraction back to the manifold. While the ideal retraction is costly to compute, and so is the projection operator that approximates it, we describe another retraction that can be computed efficiently. It has run time and memory complexity of O ( (n+m)k ) for a rank-k matrix of dimension m X n, when using an online procedure with rank-one gradients. We use this algorithm, LORETA, to learn a matrix-form similarity measure over pairs of documents represented as high dimensional vectors. LORETA improves the mean average precision over a passive-aggressive approach in a factorized model, and also improves over a full model trained on pre-selected features using the same memory requirements. We further adapt LORETA to learn positive semi-definite low-rank matrices, providing an online algorithm for low-rank metric learning. LORETA also shows consistent improvement over standard weakly supervised methods in a large (1600 classes and 1 million images, using ImageNet) multi-label image classification task.

行列形式で表現されるモデルを学習する場合、低ランク制約を適用すると、メモリと実行時間の複雑さを大幅に改善できると同時に、モデルの自然な正則化が実現します。ただし、低ランク行列のセットで関数を最小化する単純なアプローチは、時間がかかりすぎるか(行列の特異値分解を繰り返す)、数値的に不安定です(低ランク行列の因数分解表現を最適化する)。ここでは、多様体上の最適化における最近の進歩を基に、勾配ステップと、それに続く多様体への2次後退で構成される反復オンライン学習手順について説明します。理想的な後退は計算コストが高く、それを近似する射影演算子も同様にコストがかかりますが、ここでは効率的に計算できる別の後退について説明します。ランク1の勾配を持つオンライン手順を使用する場合、次元m X nのランクk行列の実行時間とメモリの複雑さはO ( (n+m)k )です。このアルゴリズムLORETAを使用して、高次元ベクトルとして表されるドキュメントのペアの行列形式の類似度測定を学習します。LORETAは、因数分解モデルにおけるパッシブアグレッシブアプローチよりも平均精度を向上させ、同じメモリ要件を使用して事前に選択された機能でトレーニングされた完全なモデルよりも精度を向上させます。さらに、LORETAを適応させて、半正定値の低ランク行列を学習し、低ランクメトリック学習のオンラインアルゴリズムを提供します。LORETAは、大規模な(ImageNetを使用した1600クラスと100万枚の画像)マルチラベル画像分類タスクにおいて、標準的な弱く監視された方法よりも一貫して改善されています。

Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming
凸計画法によるカーネルクラス上のスパース加法モデルに対するミニマックス-最適レート

Sparse additive models are families of d-variate functions with the additive decomposition f* = ∑j ∈ S f*j, where S is an unknown subset of cardinality s << d. In this paper, we consider the case where each univariate component function f*j lies in a reproducing kernel Hilbert space (RKHS), and analyze a method for estimating the unknown function f* based on kernels combined with l1-type convex regularization. Working within a high-dimensional framework that allows both the dimension d and sparsity s to increase with n, we derive convergence rates in the L2(P) and L2(Pn) norms over the class Fd,s,H of sparse additive models with each univariate function f*j in the unit ball of a univariate RKHS with bounded kernel function. We complement our upper bounds by deriving minimax lower bounds on the L2(P) error, thereby showing the optimality of our method. Thus, we obtain optimal minimax rates for many interesting classes of sparse additive models, including polynomials, splines, and Sobolev classes. We also show that if, in contrast to our univariate conditions, the d-variate function class is assumed to be globally bounded, then much faster estimation rates are possible for any sparsity s = Ω(√n), showing that global boundedness is a significant restriction in the high-dimensional setting.

スパース加法モデルは、加法分解f* =∑j∈S f*jを持つd変量関数の族であり、ここでSは濃度s << dの未知の部分集合です。この論文では、各単変量成分関数f*jが再生カーネルヒルベルト空間(RKHS)にある場合を検討し、l1型凸正規化と組み合わせたカーネルに基づいて未知の関数f*を推定する方法を分析します。次元dとスパース性sの両方がnとともに増加する高次元フレームワーク内で作業し、有界カーネル関数を持つ単変量RKHSの単位球内の各単変量関数f*jを持つスパース加法モデルのクラスFd,s,H上のL2(P)およびL2(Pn)ノルムの収束率を導出します。L2(P)誤差のミニマックス下限を導出することで上限を補完し、これにより方法の最適性を示します。このようにして、多項式、スプライン、ソボレフクラスなど、多くの興味深いクラスのスパース加法モデルに対して最適なミニマックス率が得られます。また、単変量条件とは対照的に、d変量関数クラスがグローバルに有界であると仮定すると、任意のスパースs =Ω(√n)に対してはるかに高速な推定率が可能になることを示し、グローバル有界性が高次元設定における重要な制約であることを示しています。

Bounding the Probability of Error for High Precision Optical Character Recognition
高精度光学式文字認識のための誤差確率の境界化

We consider a model for which it is important, early in processing, to estimate some variables with high precision, but perhaps at relatively low recall. If some variables can be identified with near certainty, they can be conditioned upon, allowing further inference to be done efficiently. Specifically, we consider optical character recognition (OCR) systems that can be bootstrapped by identifying a subset of correctly translated document words with very high precision. This “clean set” is subsequently used as document-specific training data. While OCR systems produce confidence measures for the identity of each letter or word, thresholding these values still produces a significant number of errors.We introduce a novel technique for identifying a set of correct words with very high precision. Rather than estimating posterior probabilities, we bound the probability that any given word is incorrect using an approximate worst case analysis. We give empirical results on a data set of difficult historical newspaper scans, demonstrating that our method for identifying correct words makes only two errors in 56 documents. Using document-specific character models generated from this data, we are able to reduce the error over properly segmented characters by 34.1% from an initial OCR system’s translation.

私たちは、処理の初期段階で、いくつかの変数を高精度で推定することが重要であるが、再現率は比較的低い可能性があるモデルについて検討します。いくつかの変数をほぼ確実に識別できる場合は、それらの変数を条件として、さらに推論を効率的に行うことができます。具体的には、正しく翻訳された文書の単語のサブセットを非常に高精度で識別することでブートストラップできる光学式文字認識(OCR)システムについて検討します。この「クリーンセット」は、その後、文書固有のトレーニングデータとして使用されます。OCRシステムは各文字または単語のIDの信頼度尺度を生成しますが、これらの値をしきい値化すると、依然としてかなりの数のエラーが発生します。ここでは、非常に高精度で正しい単語のセットを識別するための新しい手法を紹介します。事後確率を推定するのではなく、近似的な最悪ケース分析を使用して、特定の単語が間違っている確率を制限します。難しい過去の新聞スキャンのデータセットで実験結果を示し、正しい単語を識別するためのこの方法では、56の文書で2つのエラーしか発生しないことを示します。このデータから生成されたドキュメント固有の文字モデルを使用することで、適切にセグメント化された文字のエラーを、初期のOCRシステムの翻訳から34.1%削減できます。

Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics
正規化されていない統計モデルのノイズコントラスト推定と自然画像統計への応用

We consider the task of estimating, from observed data, a probabilistic model that is parameterized by a finite number of parameters. In particular, we are considering the situation where the model probability density function is unnormalized. That is, the model is only specified up to the partition function. The partition function normalizes a model so that it integrates to one for any choice of the parameters. However, it is often impossible to obtain it in closed form. Gibbs distributions, Markov and multi-layer networks are examples of models where analytical normalization is often impossible. Maximum likelihood estimation can then not be used without resorting to numerical approximations which are often computationally expensive. We propose here a new objective function for the estimation of both normalized and unnormalized models. The basic idea is to perform nonlinear logistic regression to discriminate between the observed data and some artificially generated noise. With this approach, the normalizing partition function can be estimated like any other parameter. We prove that the new estimation method leads to a consistent (convergent) estimator of the parameters. For large noise sample sizes, the new estimator is furthermore shown to behave like the maximum likelihood estimator. In the estimation of unnormalized models, there is a trade-off between statistical and computational performance. We show that the new method strikes a competitive trade-off in comparison to other estimation methods for unnormalized models. As an application to real data, we estimate novel two-layer models of natural image statistics with spline nonlinearities.

私たちは、観測データから、有限個のパラメータでパラメータ化された確率モデルを推定するタスクについて考えます。特に、モデルの確率密度関数が正規化されていない状況について考えます。つまり、モデルはパーティション関数までしか指定されません。パーティション関数は、モデルを正規化して、パラメータの任意の選択に対して1つに統合します。ただし、閉じた形式で取得することは多くの場合不可能です。ギブス分布、マルコフ、および多層ネットワークは、解析的正規化が不可能なことが多いモデルの例です。その場合、最大尤度推定は、多くの場合計算コストの高い数値近似に頼らなければ使用できません。ここでは、正規化モデルと正規化されていないモデルの両方を推定するための新しい目的関数を提案します。基本的な考え方は、非線形ロジスティック回帰を実行して、観測データと人工的に生成されたノイズを区別することです。このアプローチでは、正規化パーティション関数を他のパラメータと同様に推定できます。新しい推定方法が、パラメータの一貫した(収束する)推定値につながることを証明します。さらに、ノイズサンプルサイズが大きい場合、新しい推定量は最大尤度推定量のように動作することが示されています。正規化されていないモデルの推定では、統計的パフォーマンスと計算パフォーマンスの間にトレードオフがあります。新しい方法は、正規化されていないモデルに対する他の推定方法と比較して、競争力のあるトレードオフを実現していることを示しています。実際のデータへの応用として、スプライン非線形性を持つ自然画像統計の新しい2層モデルを推定します。

Random Search for Hyper-Parameter Optimization
ハイパーパラメータ最適化のためのランダム探索

Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random search over the same domain is able to find models that are as good or better within a small fraction of the computation time. Granting random search the same computational budget, random search finds better models by effectively searching a larger, less promising configuration space. Compared with deep belief networks configured by a thoughtful combination of manual search and grid search, purely random search over the same 32-dimensional configuration space found statistically equal performance on four of seven data sets, and superior performance on one of seven. A Gaussian process analysis of the function from hyper-parameters to validation set performance reveals that for most data sets only a few of the hyper-parameters really matter, but that different hyper-parameters are important on different data sets. This phenomenon makes grid search a poor choice for configuring algorithms for new data sets. Our analysis casts some light on why recent “High Throughput” methods achieve surprising success−they appear to search through a large number of hyper-parameters because most hyper-parameters do not matter much. We anticipate that growing interest in large hierarchical models will place an increasing burden on techniques for hyper-parameter optimization; this work shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper-parameter optimization algorithms.

グリッド検索と手動検索は、ハイパーパラメータの最適化に最も広く使用されている戦略です。この論文では、経験的かつ理論的に、ランダムに選択された試行は、グリッド上の試行よりもハイパーパラメータの最適化に効率的であることを示しています。経験的証拠は、グリッド検索と手動検索を使用してニューラルネットワークとディープビリーフネットワークを構成した大規模な以前の研究との比較から得られます。純粋なグリッド検索で構成されたニューラルネットワークと比較して、同じドメインでのランダム検索では、わずかな計算時間で同等かそれ以上のモデルを見つけることができることがわかりました。ランダム検索に同じ計算予算を与えると、ランダム検索は、より大きく見込みの少ない構成空間を効果的に検索することで、より良いモデルを見つけます。手動検索とグリッド検索を慎重に組み合わせて構成されたディープビリーフネットワークと比較して、同じ32次元構成空間での純粋なランダム検索では、7つのデータセットのうち4つで統計的に同等のパフォーマンスが得られ、7つのうち1つで優れたパフォーマンスが得られました。ハイパーパラメータから検証セットのパフォーマンスまでの関数のガウス過程分析により、ほとんどのデータセットでは実際に重要なハイパーパラメータは少数であるが、異なるデータセットでは異なるハイパーパラメータが重要であることが明らかになりました。この現象により、グリッド検索は新しいデータセットのアルゴリズムを構成するための適切な選択ではありません。私たちの分析は、最近の「高スループット」方法が驚くべき成功を収めている理由をいくらか明らかにしています。これらの方法は、ほとんどのハイパーパラメータがあまり重要ではないため、多数のハイパーパラメータを検索しているように見えます。大規模な階層モデルへの関心が高まると、ハイパーパラメータの最適化手法にかかる負担が大きくなると予想されます。この研究では、ランダム検索が、適応型(シーケンシャル)ハイパーパラメータ最適化アルゴリズムの開発の進捗を判断するための自然な基準であることを示しています。

Active Learning via Perfect Selective Classification
完全選択的分類によるアクティブラーニング

We discover a strong relation between two known learning models: stream-based active learning and perfect selective classification (an extreme case of ‘classification with a reject option’). For these models, restricted to the realizable case, we show a reduction of active learning to selective classification that preserves fast rates. Applying this reduction to recent results for selective classification, we derive exponential target-independent label complexity speedup for actively learning general (non-homogeneous) linear classifiers when the data distribution is an arbitrary high dimensional mixture of Gaussians. Finally, we study the relation between the proposed technique and existing label complexity measures, including teaching dimension and disagreement coefficient.

私たちは、ストリームベースのアクティブラーニングと完全選択的分類(「リジェクトオプションによる分類」の極端なケース)という2つの既知の学習モデルの間に強い関係があることを発見しました。これらのモデルでは、実現可能なケースに限定して、アクティブラーニングを高速レートを維持する選択的分類に縮小することを示します。この削減を選択的分類の最近の結果に適用すると、データ分布がガウスの任意の高次元混合物である場合に、一般的な(非均質な)線形分類器を積極的に学習するための指数関数的なターゲットに依存しないラベルの複雑さの高速化を導き出します。最後に、提案された手法と既存のラベル複雑性尺度(教育次元や不一致係数など)との関係を研究します。

Multi Kernel Learning with Online-Batch Optimization
オンラインバッチ最適化によるマルチカーネル学習

In recent years there has been a lot of interest in designing principled classification algorithms over multiple cues, based on the intuitive notion that using more features should lead to better performance. In the domain of kernel methods, a principled way to use multiple features is the Multi Kernel Learning (MKL) approach. Here we present a MKL optimization algorithm based on stochastic gradient descent that has a guaranteed convergence rate. We directly solve the MKL problem in the primal formulation. By having a p-norm formulation of MKL, we introduce a parameter that controls the level of sparsity of the solution, while leading to an easier optimization problem. We prove theoretically and experimentally that 1) our algorithm has a faster convergence rate as the number of kernels grows; 2) the training complexity is linear in the number of training examples; 3) very few iterations are sufficient to reach good solutions. Experiments on standard benchmark databases support our claims.

近年、より多くの特徴を使用すればパフォーマンスが向上するという直感的な概念に基づいて、複数の手がかりに対して原理的な分類アルゴリズムを設計することに多くの関心が寄せられています。カーネル法の分野では、複数の特徴を使用する原理的な方法は、マルチカーネル学習(MKL)アプローチです。ここでは、収束率が保証された確率的勾配降下法に基づくMKL最適化アルゴリズムを紹介します。MKL問題を主定式化で直接解決します。MKLのpノルム定式化を行うことで、ソリューションのスパース性のレベルを制御するパラメーターを導入し、最適化問題を容易にします。理論的および実験的に、1)カーネルの数が増えるにつれてアルゴリズムの収束率が上がること、2)トレーニングの複雑さがトレーニング例の数に比例すること、3)優れたソリューションに到達するのに非常に少ない反復で十分であることを証明しています。標準ベンチマークデータベースでの実験は、私たちの主張を裏付けています。

Active Clustering of Biological Sequences
生物学的配列のアクティブクラスタリング

Given a point set S and an unknown metric d on S, we study the problem of efficiently partitioning S into k clusters while querying few distances between the points. In our model we assume that we have access to one versus all queries that given a point s ∈ S return the distances between s and all other points. We show that given a natural assumption about the structure of the instance, we can efficiently find an accurate clustering using only O(k) distance queries. Our algorithm uses an active selection strategy to choose a small set of points that we call landmarks, and considers only the distances between landmarks and other points to produce a clustering. We use our procedure to cluster proteins by sequence similarity. This setting nicely fits our model because we can use a fast sequence database search program to query a sequence against an entire data set. We conduct an empirical study that shows that even though we query a small fraction of the distances between the points, we produce clusterings that are close to a desired clustering given by manual classification.

点集合SとS上の未知のメトリックdが与えられた場合、点間の距離を少数照会しながらSをk個のクラスターに効率的に分割する問題を研究します。このモデルでは、点s∈Sが与えられた場合にsと他のすべての点の間の距離を返す1つのクエリとすべてのクエリにアクセスできると想定します。インスタンスの構造に関する自然な仮定が与えられた場合、O(k)距離クエリのみを使用して正確なクラスタリングを効率的に見つけることができることを示します。このアルゴリズムは、アクティブ選択戦略を使用してランドマークと呼ぶ点の小さな集合を選択し、ランドマークと他の点の間の距離のみを考慮してクラスタリングを生成します。この手順を使用して、タンパク質を配列の類似性によってクラスタリングします。この設定はモデルによく適合します。高速な配列データベース検索プログラムを使用して、データセット全体に対して配列を照会できるためです。実証研究を実施した結果、点間の距離のごく一部を照会した場合でも、手動分類によって与えられた目的のクラスタリングに近いクラスタリングが生成されることがわかりました。

Optimal Distributed Online Prediction Using Mini-Batches
ミニバッチを用いた最適分布オンライン予測

Online prediction methods are typically presented as serial algorithms running on a single processor. However, in the age of web-scale prediction problems, it is increasingly common to encounter situations where a single processor cannot keep up with the high rate at which inputs arrive. In this work, we present the distributed mini-batch algorithm, a method of converting many serial gradient-based online prediction algorithms into distributed algorithms. We prove a regret bound for this method that is asymptotically optimal for smooth convex loss functions and stochastic inputs. Moreover, our analysis explicitly takes into account communication latencies between nodes in the distributed environment. We show how our method can be used to solve the closely-related distributed stochastic optimization problem, achieving an asymptotically linear speed-up over multiple processors. Finally, we demonstrate the merits of our approach on a web-scale online prediction problem.

オンライン予測手法は、通常、1つのプロセッサ上で実行されるシリアルアルゴリズムとして提供されます。しかし、Webスケールの予測問題の時代には、1つのプロセッサが入力の到着率の高さに追いつかない状況に遭遇することがますます一般的になっています。この作業では、多くの直列勾配ベースのオンライン予測アルゴリズムを分散アルゴリズムに変換する方法である分散ミニバッチアルゴリズムを紹介します。この方法が滑らかな凸損失関数と確率的入力に対して漸近的に最適であることに後悔の範囲があることを証明します。さらに、この分析では、分散環境内のノード間の通信遅延を明示的に考慮しています。この手法を使用して、密接に関連する分布確率最適化問題を解く方法を示し、複数のプロセッサで漸近的に線形の高速化を実現します。最後に、Webスケールのオンライン予測問題に対するアプローチのメリットを示します。

An Active Learning Algorithm for Ranking from Pairwise Preferences with an Almost Optimal Query Complexity
ほぼ最適なクエリ複雑性を持つペアワイズ選好からのランク付けのためのアクティブラーニングアルゴリズム

Given a set V of n elements we wish to linearly order them given pairwise preference labels which may be non-transitive (due to irrationality or arbitrary noise). The goal is to linearly order the elements while disagreeing with as few pairwise preference labels as possible. Our performance is measured by two parameters: The number of disagreements (loss) and the query complexity (number of pairwise preference labels). Our algorithm adaptively queries at most O(ε-6n log5 n) preference labels for a regret of ε times the optimal loss. As a function of n, this is asymptotically better than standard (non-adaptive) learning bounds achievable for the same problem. Our main result takes us a step closer toward settling an open problem posed by learning-to-rank (from pairwise information) theoreticians and practitioners: What is a provably correct way to sample preference labels? To further show the power and practicality of our solution, we analyze a typical test case in which a large margin linear relaxation is used for efficiently solving the simpler learning problems in our decomposition.

n個の要素からなる集合Vが与えられ、非推移的である可能性のあるペアワイズ選好ラベル(不合理性または任意のノイズのため)を与えられた要素を線形に順序付けたいとします。目標は、できるだけ少ないペアワイズ選好ラベルで不一致な要素を線形に順序付けすることです。パフォーマンスは、不一致の数(損失)とクエリの複雑さ(ペアワイズ選好ラベルの数)という2つのパラメータで測定されます。アルゴリズムは、最適損失の ε 倍の後悔に対して、最大O(ε-6n log5 n)個の選好ラベルを適応的にクエリします。nの関数として、これは同じ問題で達成可能な標準的な(非適応型の)学習境界よりも漸近的に優れています。主な結果は、(ペアワイズ情報からの)ランク付け学習の理論家および実践者によって提起された未解決の問題の解決に一歩近づきます。選好ラベルをサンプリングする証明可能な正しい方法は何ですか?私たちのソリューションの威力と実用性をさらに示すために、分解におけるより単純な学習問題を効率的に解決するために大きなマージンを持つ線形緩和を使用する典型的なテストケースを分析します。

Refinement of Operator-valued Reproducing Kernels
演算子値による再生カーネルの改良

This paper studies the construction of a refinement kernel for a given operator-valued reproducing kernel such that the vector-valued reproducing kernel Hilbert space of the refinement kernel contains that of the given kernel as a subspace. The study is motivated from the need of updating the current operator-valued reproducing kernel in multi-task learning when underfitting or overfitting occurs. Numerical simulations confirm that the established refinement kernel method is able to meet this need. Various characterizations are provided based on feature maps and vector-valued integral representations of operator-valued reproducing kernels. Concrete examples of refining translation invariant and finite Hilbert-Schmidt operator-valued reproducing kernels are provided. Other examples include refinement of Hessian of scalar-valued translation-invariant kernels and transformation kernels. Existence and properties of operator-valued reproducing kernels preserved during the refinement process are also investigated.

この論文では、与えられた演算子値再生カーネルに対する改良カーネルの構築について検討し、改良カーネルのベクトル値再生カーネルヒルベルト空間が、与えられたカーネルのそれを部分空間として含むようにします。この研究では、マルチタスク学習でアンダーフィッティングまたはオーバーフィッティングが発生した場合に、現在の演算子値再生カーネルを更新する必要性から動機付けられています。数値シミュレーションにより、確立された改良カーネル法がこのニーズを満たすことができることが確認されています。演算子値再生カーネルの特徴マップとベクトル値積分表現に基づいて、さまざまな特性評価が提供されています。変換不変および有限ヒルベルト-シュミット演算子値再生カーネルの改良の具体的な例が示されています。他の例には、スカラー値の変換不変カーネルと変換カーネルのヘッセ行列の改良が含まれます。改良プロセス中に保持される演算子値再生カーネルの存在と特性も調査されています。

Plug-in Approach to Active Learning
アクティブラーニングへのプラグインアプローチ

We present a new active learning algorithm based on nonparametric estimators of the regression function. Our investigation provides probabilistic bounds for the rates of convergence of the generalization error achievable by proposed method over a broad class of underlying distributions. We also prove minimax lower bounds which show that the obtained rates are almost tight.

私たちは、回帰関数のノンパラメトリック推定量に基づく新しいアクティブラーニングアルゴリズムを紹介します。私たちの調査は、提案された方法によって達成可能な一般化誤差の収束率について、広範なクラスの基礎となる分布に対する確率的な限界を提供します。また、得られたレートがほぼタイトであることを示すミニマックス下限も証明します。

Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection
条件付き尤度最大化:情報理論的特徴選択のための統一フレームワーク

We present a unifying framework for information theoretic feature selection, bringing almost two decades of research on heuristic filter criteria under a single theoretical interpretation. This is in response to the question: “what are the implicit statistical assumptions of feature selection criteria based on mutual information?”. To answer this, we adopt a different strategy than is usual in the feature selection literature−instead of trying to define a criterion, we derive one, directly from a clearly specified objective function: the conditional likelihood of the training labels. While many hand-designed heuristic criteria try to optimize a definition of feature ‘relevancy’ and ‘redundancy’, our approach leads to a probabilistic framework which naturally incorporates these concepts. As a result we can unify the numerous criteria published over the last two decades, and show them to be low-order approximations to the exact (but intractable) optimisation problem. The primary contribution is to show that common heuristics for information based feature selection (including Markov Blanket algorithms as a special case) are approximate iterative maximisers of the conditional likelihood. A large empirical study provides strong evidence to favour certain classes of criteria, in particular those that balance the relative size of the relevancy/redundancy terms. Overall we conclude that the JMI criterion (Yang and Moody, 1999; Meyer et al., 2008) provides the best tradeoff in terms of accuracy, stability, and flexibility with small data samples.

私たちは、情報理論的特徴選択の統一フレームワークを提示し、ヒューリスティックフィルタ基準に関するほぼ20年の研究を単一の理論的解釈の下にまとめます。これは、「相互情報量に基づく特徴選択基準の暗黙の統計的仮定は何か?」という質問に対する回答です。これに答えるために、我々は特徴選択の文献で通常採用されているものとは異なる戦略を採用します。基準を定義しようとする代わりに、明確に指定された目的関数(トレーニングラベルの条件付き尤度)から直接基準を導き出します。多くの手作業で設計されたヒューリスティック基準は、特徴の「関連性」と「冗長性」の定義を最適化しようとしますが、我々のアプローチは、これらの概念を自然に組み込んだ確率的フレームワークにつながります。その結果、過去20年間に公開された多数の基準を統一し、それらが正確な(しかし扱いにくい)最適化問題に対する低次の近似であることを示すことができます。主な貢献は、情報に基づく特徴選択の一般的なヒューリスティック(特別なケースとしてマルコフブランケットアルゴリズムを含む)が条件付き尤度の近似反復最大化器であることを示すことです。大規模な実証研究により、特定のクラスの基準、特に関連性/冗長性の用語の相対的なサイズのバランスをとる基準が有利であるという強力な証拠が得られました。全体として、JMI基準(YangおよびMoody、1999年、Meyer他、2008年)は、小規模なデータサンプルでの精度、安定性、柔軟性の点で最適なトレードオフを提供するという結論に達しました。

Distance Metric Learning with Eigenvalue Optimization
固有値最適化による距離メトリクス学習

The main theme of this paper is to develop a novel eigenvalue optimization framework for learning a Mahalanobis metric. Within this context, we introduce a novel metric learning approach called DML-eig which is shown to be equivalent to a well-known eigenvalue optimization problem called minimizing the maximal eigenvalue of a symmetric matrix (Overton, 1988; Lewis and Overton, 1996). Moreover, we formulate LMNN (Weinberger et al., 2005), one of the state-of-the-art metric learning methods, as a similar eigenvalue optimization problem. This novel framework not only provides new insights into metric learning but also opens new avenues to the design of efficient metric learning algorithms. Indeed, first-order algorithms are developed for DML-eig and LMNN which only need the computation of the largest eigenvector of a matrix per iteration. Their convergence characteristics are rigorously established. Various experiments on benchmark data sets show the competitive performance of our new approaches. In addition, we report an encouraging result on a difficult and challenging face verification data set called Labeled Faces in the Wild (LFW).

この論文の主なテーマは、マハラノビス計量を学習するための新しい固有値最適化フレームワークを開発することです。この文脈では、対称行列の最大固有値の最小化と呼ばれるよく知られた固有値最適化問題(Overton、1988年、LewisおよびOverton、1996年)と同等であることが示されたDML-eigと呼ばれる新しい計量学習アプローチを紹介します。さらに、最先端の計量学習方法の1つであるLMNN (Weinbergerら、2005年)を同様の固有値最適化問題として定式化します。この新しいフレームワークは、計量学習への新しい洞察を提供するだけでなく、効率的な計量学習アルゴリズムの設計への新しい道を開きます。実際、反復ごとに行列の最大固有ベクトルを計算するだけでよいDML-eigおよびLMNN用の1次アルゴリズムが開発されています。それらの収束特性は厳密に確立されています。ベンチマークデータセットでのさまざまな実験により、新しいアプローチの競争力のあるパフォーマンスが示されました。さらに、Labeled Faces in the Wild (LFW)と呼ばれる難しくて挑戦的な顔検証データセットで有望な結果が得られたことを報告します。

関連記事