Journal of Machine Learning Research Papers: Volume 9の論文一覧

Journal of Machine Learning Research Papers Volume 9に記載されている内容を一覧にまとめ、機械翻訳を交えて日本語化し掲載します。
Structural Learning of Chain Graphs via Decomposition
分解による鎖グラフの構造学習

Chain graphs present a broad class of graphical models for descriptionof conditional independence structures, including both Markov networksand Bayesian networks as special cases. In this paper, we propose acomputationally feasible method for the structural learning of chaingraphs based on the idea of decomposing the learning problem into aset of smaller scale problems on its decomposed subgraphs. Thedecomposition requires conditional independencies but does not requirethe separators to be complete subgraphs. Algorithms for both skeletonrecovery and complex arrow orientation are presented. Simulationsunder a variety of settings demonstrate the competitive performance ofour method, especially when the underlying graph is sparse.

チェーングラフは、マルコフネットワークとベイジアンネットワークの両方を特殊なケースとして含む、条件付き独立構造を記述するための幅広いグラフィカルモデルを提供します。この論文では、学習問題をその分解された部分グラフ上の小さなスケールの問題に分解するという考え方に基づいて、チェーングラフの構造学習のための計算的に実現可能な方法を提案します。分解には条件付きの独立性が必要ですが、セパレータが完全なサブグラフである必要はありません。skeletonrecoveryと複素矢印の向きの両方のアルゴリズムが表示されます。さまざまな設定でのシミュレーションは、特に基礎となるグラフがまばらな場合に、私たちの方法の競争力のあるパフォーマンスを示しています。

Magic Moments for Structured Output Prediction
構造化された出力予測のための魔法の瞬間

Most approaches to structured output prediction rely on a hypothesis spaceof prediction functions that compute their output by maximizinga linear scoring function.In this paper we present two novel learning algorithms for thishypothesis class, and a statistical analysis of their performance.The methods rely on efficiently computing the first two momentsof the scoring function over the output space, and using them tocreate convex objective functions for training.We report extensive experimental results for sequence alignment,named entity recognition, and RNA secondary structure prediction.

構造化された出力予測のほとんどのアプローチは、線形スコアリング関数を最大化することによって出力を計算する予測関数の仮説空間に依存しています。この論文では、この仮説クラスのための2つの新しい学習アルゴリズムと、それらのパフォーマンスの統計分析を紹介します。この手法は、出力空間に対するスコアリング関数の最初の2つのモーメントを効率的に計算し、それらを使用してトレーニング用の凸目的関数を作成することに依存しています。配列アラインメント、名前付きエンティティ認識、およびRNA二次構造予測に関する広範な実験結果を報告します。

Robust Submodular Observation Selection
ロバストなサブモジュラー観察選択

In many applications, one has to actively select among a set ofexpensive observations before making an informed decision. Forexample, in environmental monitoring, we want to select locations tomeasure in order to most effectively predict spatial phenomena.Often, we want to select observations which are robust against anumber of possible objective functions. Examples include minimizingthe maximum posterior variance in Gaussian Process regression,robust experimental design, and sensor placement for outbreakdetection. In this paper, we present the SubmodularSaturation algorithm, a simple and efficient algorithm with strongtheoretical approximation guarantees for cases where the possibleobjective functions exhibit submodularity, an intuitivediminishing returns property. Moreover, we prove that betterapproximation algorithms do not exist unless NP-completeproblems admit efficient algorithms. We show how our algorithm canbe extended to handle complex cost functions (incorporating non-unitobservation cost or communication and path costs). We also show howthe algorithm can be used to near-optimally trade off expected-case(e.g., the Mean Square Prediction Error in Gaussian Processregression) and worst-case (e.g., maximum predictive variance)performance. We show that many important machine learning problemsfit our robust submodular observation selection formalism, andprovide extensive empirical evaluation on several real-worldproblems. For Gaussian Process regression, our algorithm comparesfavorably with state-of-the-art heuristics described in thegeostatistics literature, while being simpler, faster and providingtheoretical guarantees. For robust experimental design, ouralgorithm performs favorably compared to SDP-based algorithms.

多くのアプリケーションでは、情報に基づいた決定を下す前に、コストのかかる一連の観測から能動的に選択する必要があります。たとえば、環境モニタリングでは、空間現象を最も効果的に予測するために、測定する場所を選択する必要があります。多くの場合、多数の可能な目的関数に対して堅牢な観測を選択する必要があります。例としては、ガウス過程回帰における最大事後分散の最小化、堅牢な実験設計、およびアウトブレイク検出のためのセンサー配置などがあります。この論文では、可能な目的関数がサブモジュラ性、つまり直感的な収穫逓減特性を示す場合に強力な理論的近似保証を備えた、シンプルで効率的なアルゴリズムであるサブモジュラ飽和アルゴリズムを紹介します。さらに、NP完全問題が効率的なアルゴリズムを許容しない限り、より優れた近似アルゴリズムは存在しないことを証明します。このアルゴリズムを拡張して、複雑なコスト関数(非単位観測コストや通信およびパスコストを組み込んだもの)を処理する方法を示します。また、このアルゴリズムを使用して、期待されるケース(ガウス過程回帰における平均二乗予測誤差など)と最悪のケース(最大予測分散など)のパフォーマンスをほぼ最適にトレードオフする方法も示します。多くの重要な機械学習の問題が、当社の堅牢なサブモジュラー観測選択形式に適合し、いくつかの現実の問題に対する広範な経験的評価を提供することを示します。ガウス過程回帰の場合、当社のアルゴリズムは、地質統計学の文献に記載されている最先端のヒューリスティックと比較して遜色なく、よりシンプルで高速であり、理論的な保証を提供します。堅牢な実験設計の場合、当社のアルゴリズムはSDPベースのアルゴリズムと比較して優れたパフォーマンスを発揮します。

Automatic PCA Dimension Selection for High Dimensional Data and Small Sample Sizes
高次元データと小さなサンプルサイズのための自動PCA寸法選択

Bayesian inference from high-dimensional data involves theintegration over a large number of model parameters. Accurateevaluation of such high-dimensional integrals raises a unique setof issues. These issues are illustrated using the exemplar ofmodel selection for principal component analysis (PCA). A Bayesianmodel selection criterion, based on a Laplace approximation to themodel evidence for determining the number of signal principalcomponents present in a data set, has previously been show toperform well on various test data sets. Using simulated data weshow that for d-dimensional data and small sample sizes, N,the accuracy of this model selection method is strongly affectedby increasing values of d. By taking proper account of thecontribution to the evidence from the large number ofmodel parameters we show that model selection accuracy issubstantially improved. The accuracy of the improved model evidence is studiedin the asymptotic limit d → ∞ at fixed ratioα = N/d, with α < 1. In this limit, model selectionbased upon the improved model evidence agrees with a frequentisthypothesis testing approach.

高次元データからのベイズ推論には、多数のモデルパラメータの積分が含まれます。このような高次元積分の正確な評価には、一連の特有の問題が伴います。これらの問題は、主成分分析(PCA)のモデル選択の例を使用して説明されます。データセット内に存在する信号主成分の数を決定するためのモデル証拠に対するラプラス近似に基づくベイズモデル選択基準は、さまざまなテストデータセットで良好なパフォーマンスを示すことがこれまでに示されています。シミュレーションデータを使用して、d次元データと小さいサンプルサイズNの場合、このモデル選択方法の精度はdの値の増加によって大きく影響を受けることを示します。多数のモデルパラメータからの証拠への寄与を適切に考慮することで、モデル選択の精度が大幅に向上することを示します。改良されたモデルの証拠の精度は、α< 1の固定比率 α= N/dで漸近限界d→ ∞ で調査されます。この限界では、改良されたモデルの証拠に基づくモデル選択は、頻度主義の仮説検定アプローチと一致します。

Learning Bounded Treewidth Bayesian Networks
有界ツリー幅ベイジアンネットワークの学習

With the increased availability of data for complex domains, it isdesirable to learn Bayesian network structures that are sufficientlyexpressive for generalization while at the same time allow fortractable inference. While the method of thin junction trees can, inprinciple, be used for this purpose, its fully greedy nature makes itprone to overfitting, particularly when data is scarce. In this workwe present a novel method for learning Bayesian networks of boundedtreewidth that employs global structure modifications and that ispolynomial both in the size of the graph and the treewidth bound. Atthe heart of our method is a dynamic triangulation that we update in away that facilitates the addition of chain structures that increasethe bound on the model’s treewidth by at most one. We demonstrate theeffectiveness of our “treewidth-friendly” method on severalreal-life data sets and show that it is superior to the greedyapproach as soon as the bound on the treewidth is nontrivial.Importantly, we also show that by making use of global operators, weare able to achieve better generalization even when learning Bayesiannetworks of unbounded treewidth.

複雑なドメインのデータが利用できる機会が増えたため、一般化に十分な表現力を持ちながら、同時に扱いやすい推論を可能にするベイジアンネットワーク構造を学習することが望ましい。原理的には、この目的に薄いジャンクションツリー法を使用できるが、その完全な貪欲性により、特にデータが不足している場合には、過剰適合になりやすい。この研究では、グローバル構造変更を採用し、グラフのサイズとツリー幅の境界の両方において多項式である、制限されたツリー幅のベイジアンネットワークを学習するための新しい方法を提示します。この方法の核となるのは、モデルのツリー幅の境界を最大で1だけ増やすチェーン構造の追加を容易にする方法で更新する動的な三角測量です。私たちは、いくつかの実際のデータセットで「ツリー幅に優しい」方法の有効性を実証し、ツリー幅の境界が自明でない場合は貪欲なアプローチよりも優れていることを示します。重要な点として、グローバル演算子を使用することで、無制限のツリー幅のベイジアンネットワークを学習する場合でも、より優れた一般化を実現できることも示します。

JNCC2: The Java Implementation Of Naive Credal Classifier 2
JNCC2:Naive Credal Classifier 2のJava実装

JNCC2 implements the naive credal classifier 2 (NCC2). This isan extension of naive Bayes to imprecise probabilities that aims atdelivering robust classifications also when dealing with small orincomplete data sets. Robustness is achieved by delivering set-valuedclassifications (that is, returning multiple classes) on theinstances for which (i) the learning set is not informative enough tosmooth the effect of choice of the prior density or (ii) theuncertainty arising from missing data prevents the reliableindication of a single class. JNCC2 is released under the GNU GPLlicense.

JNCC2は、naïve credal classifier 2 (NCC2)を実装します。これは、単純なベイズを不正確な確率に拡張したもので、小さな不完全なデータセットを扱う場合でも堅牢な分類を提供することを目的としています。堅牢性は、(i)学習セットが事前密度の選択の影響を滑らかにするほど情報量が少ない、または(ii)データの欠落から生じる不確実性が1つのクラスの信頼性表示を妨げるインスタンスに対して、セット値分類を提供する(つまり、複数のクラスを返す)ことによって達成されます。JNCC2はGNU GPLライセンスの下でリリースされています。

An Extension on “Statistical Comparisons of Classifiers over Multiple Data Sets” for all Pairwise Comparisons
すべてのペアワイズ比較のための「複数のデータセットにわたる分類子の統計的比較」の拡張

In a recently published paper in JMLR, Demar (2006) recommendsa set of non-parametric statistical tests and procedures which canbe safely used for comparing the performance of classifiers overmultiple data sets. After studying the paper, we realize that thepaper correctly introduces the basic procedures and some of themost advanced ones when comparing a control method. However, itdoes not deal with some advanced topics in depth. Regarding thesetopics, we focus on more powerful proposals of statisticalprocedures for comparing n × n classifiers. Moreover, weillustrate an easy way of obtaining adjusted and comparablep-values in multiple comparison procedures.

JMLRに最近発表された論文で、Demar(2006)は、複数のデータセットにわたる分類器のパフォーマンスを比較するために安全に使用できるノンパラメトリック統計テストと手順のセットを推奨しています。論文を研究した後、この論文では、制御方法を比較する際の基本的な手順と最も高度な手順のいくつかを正しく紹介していることがわかります。ただし、一部の高度なトピックを深く扱っていません。これらのトピックに関して、n×nの分類子を比較するための統計手順のより強力な提案に焦点を当てます。さらに、多重比較手順で調整された比較可能なp値を取得する簡単な方法を説明します。

Multi-Agent Reinforcement Learning in Common Interest and Fixed Sum Stochastic Games: An Experimental Study
共通利益と固定和確率ゲームにおけるマルチエージェント強化学習:実験的研究

Multi Agent Reinforcement Learning (MARL) has received continuallygrowing attention in the past decade. Many algorithms that vary intheir approaches to the different subtasks of MARL have beendeveloped. However, the theoretical convergence results for thesealgorithms do not give a clue as to their practical performance norsupply insights to the dynamics of the learning process itself. Thiswork is a comprehensive empirical study conducted on MGS, asimulation system developed for this purpose. It surveys theimportant algorithms in the field, demonstrates the strengths andweaknesses of the different approaches to MARL through applicationof FriendQ, OAL, WoLF, FoeQ, Rmax, and other algorithms to a varietyof fully cooperative and fully competitive domains in self andheterogeneous play, and supplies an informal analysis of theresulting learning processes. The results can aid in the design ofnew learning algorithms, in matching existing algorithms to specifictasks, and may guide further research and formal analysis of thelearning processes.

マルチエージェント強化学習(MARL)は、過去10年間で継続的に注目を集めています。MARLのさまざまなサブタスクへのアプローチが異なる多くのアルゴリズムが開発されています。ただし、これらのアルゴリズムの理論的な収束結果からは、実際のパフォーマンスに関するヒントは得られず、学習プロセス自体のダイナミクスに関する洞察も得られません。この研究では、この目的のために開発されたシミュレーションシステムであるMGSで実施された包括的な実証研究です。この分野で重要なアルゴリズムを調査し、FriendQ、OAL、WoLF、FoeQ、Rmax、およびその他のアルゴリズムを、自己プレイおよび異種プレイにおけるさまざまな完全協力および完全競争ドメインに適用することにより、MARLに対するさまざまなアプローチの長所と短所を示し、結果として得られる学習プロセスの非公式な分析を提供します。結果は、新しい学習アルゴリズムの設計、既存のアルゴリズムを特定のタスクに一致させるのに役立ち、学習プロセスのさらなる研究と正式な分析の指針となる可能性があります。

Model Selection for Regression with Continuous Kernel Functions Using the Modulus of Continuity
連続性係数を使用した連続カーネル関数による回帰のモデル選択

This paper presents a new method of model selection for regressionproblems using the modulus of continuity. For this purpose, wesuggest the prediction risk bounds of regression models using themodulus of continuity which can be interpreted as the complexity offunctions. We also present the model selection criterion referred toas the modulus of continuity information criterion (MCIC) which isderived from the suggested prediction risk bounds. The suggestedMCIC provides a risk estimate using the modulus of continuity for atrained regression model (or an estimation function) while othermodel selection criteria such as the AIC and BIC use structuralinformation such as the number of training parameters. As a result,the suggested MCIC is able to discriminate the performances oftrained regression models, even with the same structure of trainingmodels. To show the effectiveness of the proposed method, thesimulation for function approximation using the multilayerperceptrons (MLPs) was conducted. Through the simulation forfunction approximation, it was demonstrated that the suggested MCICprovides a good selection tool for nonlinear regression models, evenwith the limited size of data.

この論文では、連続係数を用いた回帰問題のためのモデル選択の新しい方法を提案します。この目的のために、関数の複雑性として解釈できる連続係数を用いた回帰モデルの予測リスク境界を提案します。また、提案された予測リスク境界から導かれる連続係数情報基準(MCIC)と呼ばれるモデル選択基準も提示します。提案されたMCICは、AICやBICなどの他のモデル選択基準がトレーニングパラメータの数などの構造情報を使用するのに対し、トレーニングされた回帰モデル(または推定関数)の連続係数を使用してリスク推定を提供します。その結果、提案されたMCICは、同じトレーニングモデル構造であっても、トレーニングされた回帰モデルの性能を区別することができます。提案された方法の有効性を示すために、多層パーセプトロン(MLP)を使用した関数近似のシミュレーションを行った。関数近似のシミュレーションを通じて、提案されたMCICは、データのサイズが限られている場合でも、非線形回帰モデルに適した選択ツールを提供することが実証されました。

Visualizing Data using t-SNE
t-SNEを使用したデータの可視化

We present a new technique called “t-SNE” that visualizeshigh-dimensional data by giving each datapoint a location in a two orthree-dimensional map. The technique is a variation of StochasticNeighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize,and produces significantly better visualizations by reducing thetendency to crowd points together in the center of the map. t-SNE isbetter than existing techniques at creating a single map that revealsstructure at many different scales. This is particularly important forhigh-dimensional data that lie on several different, but related,low-dimensional manifolds, such as images ofobjects from multipleclasses seen from multiple viewpoints. For visualizing the structureof very large data sets, we show how t-SNE can use random walks onneighborhood graphs to allow the implicit structure of all of the datato influence the way in which a subset of the data is displayed. Weillustrate the performance of t-SNE on a wide variety of data sets andcompare it with many other non-parametric visualization techniques,including Sammon mapping, Isomap, and Locally Linear Embedding. Thevisualizations produced by t-SNE are significantly better than thoseproduced by the other techniques on almost all of the data sets.

私たちは、各データポイントに2次元または3次元マップ内の位置を与えることで高次元データを視覚化する「t-SNE」という新しい手法を紹介します。この手法は、最適化がはるかに容易なStochastic Neighbor Embedding (HintonおよびRoweis、2002)のバリエーションであり、マップの中央にポイントが密集する傾向を減らすことで、視覚化が大幅に向上します。t-SNEは、さまざまなスケールで構造を明らかにする単一のマップを作成する点で、既存の手法よりも優れています。これは、複数の視点から見た複数のクラスのオブジェクトの画像など、複数の異なるが関連する低次元多様体上にある高次元データにとって特に重要です。非常に大規模なデータセットの構造を視覚化するために、私たちは、t-SNEが近傍グラフ上のランダムウォークを使用して、すべてのデータの暗黙的な構造がデータのサブセットの表示方法に影響を与える方法を示します。さまざまなデータセットでのt-SNEのパフォーマンスを示し、Sammonマッピング、Isomap、Locally Linear Embeddingなど、他の多くの非パラメトリック視覚化手法と比較します。t-SNEによって生成された視覚化は、ほぼすべてのデータセットで他の手法によって生成された視覚化よりも大幅に優れています。

Stationary Features and Cat Detection
固定機能とCat検出

Most discriminative techniques for detecting instances from objectcategories in still images consist of looping over a partition of apose space with dedicated binary classifiers. The efficiency of thisstrategy for a complex pose, that is, for fine-grained descriptions, canbe assessed by measuring the effect of sample size and pose resolutionon accuracy and computation. Two conclusions emerge: (1) fragmentingthe training data, which is inevitable in dealing with high in-classvariation, severely reduces accuracy; (2) the computational cost athigh resolution is prohibitive due to visiting a massive posepartition.To overcome data-fragmentation we propose a novel framework centeredon pose-indexed features which assign a response to a pair consistingof an image and a pose, and are designed to be stationary: theprobability distribution of the response is always the same if anobject is actually present. Such features allow for efficient,one-shot learning of pose-specific classifiers. To avoid expensivescene processing, we arrange these classifiers in a hierarchy based onnested partitions of the pose; as in previous work on coarse-to-finesearch, this allows for efficient processing.The hierarchy is then “folded” for training: all the classifiers ateach level are derived from one base predictor learned from all thedata. The hierarchy is “unfolded” for testing: parsing a scene amountsto examining increasingly finer object descriptions only when there issufficient evidence for coarser ones. In this way, the detectionresults are equivalent to an exhaustive search at high resolution. Weillustrate these ideas by detecting and localizing cats in highlycluttered greyscale scenes.

静止画像内のオブジェクトカテゴリからインスタンスを検出するための識別手法のほとんどは、専用のバイナリ分類器を使用してポーズ空間のパーティションをループ処理するものです。複雑なポーズ、つまりきめの細かい記述に対するこの戦略の効率は、サンプルサイズとポーズ解像度が精度と計算に与える影響を測定することで評価できます。2つの結論が導き出されます。(1)クラス内変動が大きい場合、トレーニングデータを断片化することは避けられませんが、精度が大幅に低下します。(2)高解像度での計算コストは、大規模なポーズパーティションにアクセスするため、法外に高くなります。データの断片化を克服するために、画像とポーズのペアに応答を割り当て、静止するように設計されたポーズインデックスフィーチャを中心とした新しいフレームワークを提案します。つまり、オブジェクトが実際に存在する場合、応答の確率分布は常に同じです。このようなフィーチャにより、ポーズ固有の分類器を効率的に1回で学習できます。コストのかかるシーン処理を回避するために、ポーズのネストされたパーティションに基づいてこれらの分類器を階層的に配置します。粗から細への検索に関する以前の研究と同様に、これにより効率的な処理が可能になります。階層はトレーニング用に「折り畳まれ」ます。各レベルのすべての分類器は、すべてのデータから学習された1つの基本予測子から派生します。階層はテスト用に「展開」されます。シーンを解析することは、粗いオブジェクトの説明の十分な証拠がある場合にのみ、より細かいオブジェクトの説明を調べることを意味します。このように、検出結果は高解像度での徹底的な検索に相当します。これらのアイデアを、非常に雑然としたグレースケールのシーンで猫を検出して位置を特定することで説明します。

Active Learning of Causal Networks with Intervention Experiments and Optimal Designs
介入実験と最適設計による因果ネットワークのアクティブラーニング

The causal discovery from data is important for various scientificinvestigations. Because we cannot distinguish the different directedacyclic graphs (DAGs) in a Markov equivalence class learned fromobservational data, we have to collect further information on causalstructures from experiments with external interventions. In thispaper, we propose an active learning approach for discovering causalstructures in which we first find a Markov equivalence class fromobservational data, and then we orient undirected edges in everychain component via intervention experiments separately. In theexperiments, some variables are manipulated through externalinterventions. We discuss two kinds of intervention experiments,randomized experiment and quasi-experiment. Furthermore, we give twooptimal designs of experiments, a batch-intervention design and asequential-intervention design, to minimize the number ofmanipulated variables and the set of candidate structures based onthe minimax and the maximum entropy criteria. We show theoreticallythat structural learning can be done locally in subgraphs of chaincomponents without need of checking illegal v-structures and cyclesin the whole network and that a Markov equivalence subclass obtainedafter each intervention can still be depicted as a chain graph.

データからの因果発見は、さまざまな科学的調査にとって重要です。観測データから学習したマルコフ同値類内の異なる有向非巡回グラフ(DAG)を区別できないため、外部介入を伴う実験から因果構造に関するさらなる情報を収集する必要があります。この論文では、因果構造を発見するための能動学習アプローチを提案します。このアプローチでは、まず観測データからマルコフ同値類を見つけ、次に介入実験によってチェーンの各コンポーネントの無向エッジを個別に方向付けます。実験では、一部の変数が外部介入によって操作されます。ランダム化実験と準実験の2種類の介入実験について説明します。さらに、バッチ介入設計と順次介入設計の2つの最適な実験設計を示し、ミニマックス基準と最大エントロピー基準に基づいて、操作変数の数と候補構造のセットを最小化します。私たちは、ネットワーク全体の不正なv構造やサイクルをチェックする必要なく、チェーンコンポーネントのサブグラフ内で構造学習をローカルに実行できること、また各介入後に得られるマルコフ同値サブクラスをチェーングラフとして表現できることを理論的に示します。

SimpleMKL
シンプルMKL

Multiple kernel learning (MKL) aims at simultaneously learning akernel and the associated predictor in supervised learningsettings. For the support vector machine, an efficient and generalmultiple kernel learning algorithm, based on semi-infinite linearprogramming, has been recently proposed. This approach has opened newperspectives since it makes MKL tractable for large-scale problems, byiteratively using existing support vector machine code. However, itturns out that this iterative algorithm needs numerous iterations forconverging towards a reasonable solution. In this paper, we addressthe MKL problem through a weighted 2-norm regularization formulationwith an additional constraint on the weights that encourages sparsekernel combinations. Apart from learning the combination, we solve astandard SVM optimization problem, where the kernel is defined as alinear combination of multiple kernels. We propose an algorithm,named SimpleMKL, for solving this MKL problem and provide a newinsight on MKL algorithms based on mixed-norm regularization byshowing that the two approaches are equivalent. We show how SimpleMKLcan be applied beyond binary classification, for problems likeregression, clustering (one-class classification) or multiclassclassification. Experimental results show that the proposed algorithmconverges rapidly and that its efficiency compares favorably to otherMKL algorithms. Finally, we illustrate the usefulness of MKL for someregressors based on wavelet kernels and on some model selectionproblems related to multiclass classification problems.

マルチカーネル学習(MKL)は、教師あり学習設定でカーネルと関連する予測子を同時に学習することを目的としています。サポートベクターマシンでは、半無限線形計画法に基づく効率的で汎用的なマルチカーネル学習アルゴリズムが最近提案されました。このアプローチは、既存のサポートベクターマシンコードを反復的に使用することで、大規模な問題に対してMKLを扱いやすくするため、新しい展望を開きました。ただし、この反復アルゴリズムでは、妥当なソリューションに収束するために多数の反復が必要であることが判明しました。この論文では、スパースカーネルの組み合わせを促進する重みに対する追加の制約を伴う、重み付き2ノルム正則化定式化を通じてMKL問題に対処します。組み合わせの学習とは別に、カーネルが複数のカーネルの線形結合として定義される標準的なSVM最適化問題を解決します。このMKL問題を解決するためのSimpleMKLというアルゴリズムを提案し、2つのアプローチが同等であることを示すことで、混合ノルム正則化に基づくMKLアルゴリズムに関する新しい洞察を提供します。ここでは、SimpleMKLをバイナリ分類以外にも、回帰、クラスタリング(1クラス分類)、マルチクラス分類などの問題に適用する方法を示します。実験結果では、提案されたアルゴリズムが急速に収束し、その効率が他のMKLアルゴリズムに匹敵することがわかりました。最後に、ウェーブレットカーネルに基づくいくつかの回帰器と、マルチクラス分類問題に関連するいくつかのモデル選択問題に対するMKLの有用性を示します。

On the Equivalence of Linear Dimensionality-Reducing Transformations
線形次元削減変換の等価性について

In this JMLR volume, Ye (2008) demonstrates the essentialequivalence of two sets of solutions to a generalized Fisher criterionused for linear dimensionality reduction (see Ye, 2005; Loog, 2007).Here, I point out the basic flaw in this new contribution.

このJMLRの巻では、Ye(2008)は、線形次元削減に使用される一般化されたフィッシャー基準に対する2組の解の本質的な等価性を示しています(Ye、2005;Loog、2007)。ここでは、この新しい投稿の基本的な欠点を指摘します。

Minimal Nonlinear Distortion Principle for Nonlinear Independent Component Analysis
非線形独立成分解析のための最小非線形歪み原理

It is well known that solutions to the nonlinear independentcomponent analysis (ICA) problem are highly non-unique. In thispaper we propose the “minimal nonlinear distortion” (MND)principle for tackling the ill-posedness of nonlinear ICAproblems. MND prefers the nonlinear ICA solution with theestimated mixing procedure as close as possible to linear,among all possible solutions. It also helps to avoid local optima in thesolutions. To achieve MND, we exploit a regularization term tominimize the mean square error between the nonlinear mixingmapping and the best-fitting linear one. The effect of MND onthe inherent trivial and non-trivial indeterminacies innonlinear ICA solutions is investigated. Moreover, we show thatlocal MND is closely related to the smoothness regularizerpenalizing large curvature, which provides another usefulregularizationcondition for nonlinear ICA.Experiments on synthetic data show the usefulness of the MNDprinciple for separating various nonlinear mixtures.Finally, as an application, we use nonlinear ICA with MND toseparate daily returns of a set of stocks in Hong Kong, and thelinear causal relations among them are successfully discovered.The resulting causal relations give some interesting insightsinto the stock market. Such a result can not be achieved bylinear ICA. Simulation studies also verify that when doingcausality discovery, sometimes one should not ignore thenonlinear distortion in the data generation procedure, even ifit is weak.

非線形独立成分分析(ICA)の問題に対する解法は、極めて非一意であることがよく知られています。この論文では、非線形ICA問題の不適切性に対処するための「最小非線形歪み」(MND)原理を提案します。MNDは、すべての可能な解法の中で、推定された混合手順が可能な限り線形に近い非線形ICA解法を優先します。また、解法の局所最適を回避するのにも役立ちます。MNDを実現するために、非線形混合マッピングと最も適合する線形マッピング間の平均二乗誤差を最小化するために、正則化項を利用します。非線形ICA解法に固有の自明な不確定性と非自明な不確定性に対するMNDの影響を調査します。さらに、ローカルMNDは大きな曲率をペナルティする平滑性正則化子と密接に関連していることを示し、これは非線形ICAのもう1つの有用な正則化条件を提供します。合成データでの実験は、さまざまな非線形混合物を分離するためのMND原理の有用性を示しています。最後に、アプリケーションとして、香港の株式セットの日次収益を分離するためにMNDを使用した非線形ICAを使用し、それらの間の線形因果関係を正常に検出しました。結果として得られた因果関係は、株式市場に関する興味深い洞察を提供します。このような結果は、線形ICAでは達成できません。シミュレーション研究は、因果関係の検出を行う際に、データ生成手順での非線形歪みが弱い場合でも無視してはならないことも検証しています。

On the Size and Recovery of Submatrices of Ones in a Random Binary Matrix
ランダムバイナリ行列における1のサブマトリックスのサイズと回復について

Binary matrices, and their associated submatrices of 1s, play acentral role in the study of random bipartite graphs and in core datamining problems such as frequent itemset mining (FIM). Motivated bythese connections, this paper addresses several statistical questionsregarding submatrices of 1s in a random binary matrix with independentBernoulli entries. We establish a three-point concentration result,and a related probability bound, for the size of the largest squaresubmatrix of 1s in a square Bernoulli matrix, and extend these resultsto non-square matrices and submatrices with fixed aspect ratios. Wethen consider the noise sensitivity of frequent itemset mining under asimple binary additive noise model, and show that, even at small noiselevels, large blocks of 1s leave behind fragments of only logarithmicsize. As a result, standard FIM algorithms, which search only forsubmatrices of 1s, cannot directly recover such blocks when noise ispresent. On the positive side, we show that an error-tolerantfrequent itemset criterion can recover a submatrix of 1s against abackground of 0s plus noise, even when the size of the submatrix of 1sis very small.

バイナリマトリックスとそれに関連する1のサブマトリックスは、ランダム2部グラフの研究や頻繁アイテムセットマイニング(FIM)などのコアデータマイニング問題で中心的な役割を果たします。これらの関連性に着目して、この論文では、独立したベルヌーイエントリを持つランダムバイナリマトリックス内の1のサブマトリックスに関するいくつかの統計的疑問を取り上げます。正方ベルヌーイマトリックス内の1の最大の正方サブマトリックスのサイズについて、3点集中の結果と関連する確率の境界を確立し、これらの結果を固定アスペクト比を持つ非正方マトリックスとサブマトリックスに拡張します。次に、単純なバイナリ加法ノイズモデルでの頻繁アイテムセットマイニングのノイズ感度を検討し、ノイズレベルが小さい場合でも、1の大きなブロックが対数サイズのフラグメントのみを残すことを示します。その結果、1のサブマトリックスのみを検索する標準FIMアルゴリズムは、ノイズが存在する場合、そのようなブロックを直接回復できません。良い面としては、エラー許容度の高い頻繁なアイテムセット基準により、1のサブマトリックスのサイズが非常に小さい場合でも、0とノイズの背景に対して1のサブマトリックスを回復できることを示しています。

Non-Parametric Modeling of Partially Ranked Data
部分的にランク付けされたデータのノンパラメトリックモデリング

Statistical models on full and partial rankings of n items areoften of limited practical use for large n due to computationalconsideration. We explore the use of non-parametric models forpartially ranked data and derive computationally efficientprocedures for their use for large n. The derivations arelargely possible through combinatorial and algebraic manipulationsbased on the lattice of partial rankings. A bias-variance analysisand an experimental study demonstrate the applicability of theproposed method.

n個の項目の完全および部分的なランク付けに関する統計モデルは、計算上の考慮事項により、大きなnに対しては限られた実用化ができないことがよくあります。部分的にランク付けされたデータに対するノンパラメトリックモデルの使用を検討し、大きなnに使用するための計算効率の高い手順を導き出します。導出は、部分ランキングの格子に基づく組み合わせ操作と代数的操作によって主に可能です。バイアス分散分析と実験的研究により、提案された方法の適用性が実証されています。

Model Selection in Kernel Based Regression using the Influence Function
Influence 関数を使用したカーネルベース回帰でのモデル選択

Recent results about the robustness of kernel methods involve theanalysis of influence functions. By definition the influence functionis closely related to leave-one-out criteria. In statisticallearning, the latter is often used to assess the generalization of amethod. In statistics, the influence function is used in a similar wayto analyze the statistical efficiency of a method. Links between bothworlds are explored. The influence function is related to the firstterm of a Taylor expansion. Higher order influence functions arecalculated. A recursive relation between these terms is foundcharacterizing the full Taylor expansion. It is shown how to evaluateinfluence functions at a specific sample distribution to obtain anapproximation of the leave-one-out error. A specific implementation isproposed using a L1 loss in the selection of the hyperparametersand a Huber loss in the estimation procedure. The parameter in theHuber loss controlling the degree of robustness is optimized aswell. The resulting procedure gives good results, even when outliersare present in the data.

カーネル法の堅牢性に関する最近の結果には、影響関数の分析が含まれています。定義により、影響関数は、leave-one out基準と密接に関連しています。統計学習では、後者は、方法の一般化を評価するためによく使用されます。統計では、影響関数は、方法の統計的効率を分析するために同様の方法で使用されます。両方の世界の間のリンクが調査されます。影響関数は、テイラー展開の最初の項に関連しています。より高次の影響関数が計算されます。これらの項間の再帰関係が見つかり、完全なテイラー展開を特徴付けます。特定のサンプル分布で影響関数を評価して、leave-one outエラーの近似値を取得する方法が示されています。ハイパーパラメータの選択でL1損失を使用し、推定手順でHuber損失を使用する特定の実装が提案されています。堅牢性の程度を制御するHuber損失のパラメータも最適化されています。結果として得られる手順では、データに外れ値が存在する場合でも、良好な結果が得られます。

Learning to Select Features using their Properties
プロパティを使用したフィーチャの選択の学習

Feature selection is the task of choosing a small subset of featuresthat is sufficient to predict the target labels well. Here, insteadof trying to directly determine which features are better, we attemptto learn the properties of good features. For this purpose we assumethat each feature is represented by a set of properties, referredto as meta-features. This approach enables prediction of thequality of features without measuring their value on the traininginstances. We use this ability to devise new selection algorithmsthat can efficiently search for new good features in the presenceof a huge number of features, and to dramatically reduce the numberof feature measurements needed. We demonstrate our algorithms on ahandwritten digit recognition problem and a visual object categoryrecognition problem. In addition, we show how this novel viewpointenables derivation of better generalization bounds for the joint learningproblem of selection and classification, and how it contributes toa better understanding of the problem. Specifically, in the contextof object recognition, previous works showed that it is possible tofind one set of features which fits most object categories (aka auniversal dictionary). Here we use our framework to analyzeone such universal dictionary and find that the quality of featuresin this dictionary can be predicted accurately by its meta-features.

特徴選択とは、ターゲットラベルを適切に予測するのに十分な特徴の小さなサブセットを選択するタスクです。ここでは、どの特徴が優れているかを直接判断するのではなく、優れた特徴の特性を学習しようとします。この目的のために、各特徴はメタ特徴と呼ばれる一連の特性によって表されると仮定します。このアプローチにより、トレーニングインスタンスでその値を測定することなしに、特徴の品質を予測できます。この能力を利用して、膨大な数の特徴が存在する場合に新しい優れた特徴を効率的に検索し、必要な特徴測定の数を大幅に削減できる新しい選択アルゴリズムを考案します。手書きの数字認識問題と視覚オブジェクトカテゴリ認識問題でアルゴリズムを実証します。さらに、この新しい観点によって、選択と分類の共同学習問題に対するより優れた一般化境界を導出できるようになり、それが問題の理解を深めるのにどのように役立つかを示します。具体的には、オブジェクト認識のコンテキストでは、以前の研究で、ほとんどのオブジェクトカテゴリに適合する1つの特徴セット(つまり、ユニバーサルディクショナリ)を見つけることが可能であることが示されました。ここでは、私たちのフレームワークを使用してそのようなユニバーサル辞書の1つを分析し、この辞書の機能の品質がそのメタ機能によって正確に予測できることを発見しました。

Probabilistic Characterization of Random Decision Trees
ランダム決定木の確率的特性評価

In this paper we use the methodology introduced by Dhurandhar and Dobra (2009) foranalyzing the error of classifiers and the model selection measures,to analyze decision tree algorithms. The methodology consists ofobtaining parametric expressions for the moments of the generalizationerror (GE) for the classification model of interest, followed byplotting these expressions for interpretability. The major challengein applying the methodology to decision trees, the main theme of thiswork, is customizing the generic expressions for the moments of GE tothis particular classification algorithm. The specific contributionswe make in this paper are: (a) we primarily characterize a subclass ofdecision trees namely, Random decision trees, (b) we discuss how theanalysis extends to other decision tree algorithms and (c) in order toextend the analysis to certain model selection measures, we generalizethe relationships between the moments of GE and moments of the modelselection measures given in (Dhurandhar and Dobra, 2009) to randomized classificationalgorithms. An empirical comparison of the proposed method with MonteCarlo and distribution free bounds obtained using Breiman’s formula,depicts the advantages of the method in terms of running time andaccuracy. It thus showcases the use of the deployed methodology as anexploratory tool to study learning algorithms.

この論文では、DhurandharとDobra (2009)が分類器のエラーとモデル選択尺度を分析するために導入した方法論を使用して、決定木アルゴリズムを分析します。この方法論は、対象の分類モデルの一般化エラー(GE)のモーメントのパラメーター式を取得し、解釈可能性のためにこれらの式をプロットすることから構成されます。この方法論を決定木に適用する際の主な課題は、この研究の主要テーマである、この特定の分類アルゴリズムに対するGEモーメントの汎用式をカスタマイズすることです。この論文で私たちが行う具体的な貢献は次のとおりです。(a)主に、ランダム決定木という決定木のサブクラスを特徴付けます。(b)この分析が他の決定木アルゴリズムにどのように拡張されるかについて説明します。(c)分析を特定のモデル選択尺度に拡張するために、(DhurandharとDobra、2009)で示されているGEモーメントとモデル選択尺度のモーメントの関係をランダム分類アルゴリズムに一般化します。提案された方法とモンテカルロ法、およびブレイマンの公式を使用して得られた分布自由境界との経験的比較により、実行時間と精度の点でこの方法の利点が明らかになりました。これにより、展開された方法論を学習アルゴリズムを研究するための探索ツールとして使用できることが示されました。

Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension
次元で対数である後悔境界を持つランダム化オンラインPCAアルゴリズム

We design an online algorithm for Principal ComponentAnalysis. In each trialthe current instance is centered and projected intoa probabilistically chosen low dimensional subspace.The regret of our online algorithm,that is, the total expected quadratic compression loss of theonline algorithm minus the total quadratic compression lossof the batch algorithm, is bounded by a term whose dependence on the dimension of the instances is only logarithmic.We first develop our methodology in the expert setting of onlinelearning by giving an algorithm for learning as wellas the best subset of experts of a certain size. This algorithm is then lifted to the matrix setting wherethe subsets of experts correspond to subspaces.The algorithm represents the uncertainty over the best subspaceas a density matrix whose eigenvalues are bounded.The running time is O(n2) per trial, wheren is the dimension of the instances.

私たちは、Principal ComponentAnalysisのオンラインアルゴリズムを設計します。各試行では、現在のインスタンスは中央に配置され、確率的に選択された低次元部分空間に投影されます。オンラインアルゴリズムの後悔、つまり、オンラインアルゴリズムの予想される2次圧縮損失の合計からバッチアルゴリズムの2次圧縮損失の合計を差し引いたものは、インスタンスの次元への依存性が対数のみである項によって制限されます。まず、オンライン学習の専門家設定で方法論を開発し、学習用のアルゴリズムと特定のサイズの専門家の最良のサブセットを提供します。次に、このアルゴリズムは、エキスパートのサブセットが部分空間に対応する行列設定に持ち上げられます。このアルゴリズムは、固有値が有界である密度行列として最適な部分空間の不確実性を表します。実行時間は試行あたりO(n2)で、はインスタンスの次元です。

Finding Optimal Bayesian Network Given a Super-Structure
上部構造が与えられた最適なベイジアンネットワークの探索

Classical approaches used to learn Bayesian network structure fromdata have disadvantages in terms of complexity and lower accuracy oftheir results. However, a recent empirical study has shown that ahybrid algorithm improves sensitively accuracy and speed: it learns askeleton with an independency test (IT) approach and constrains on thedirected acyclic graphs (DAG) considered during the search-and-scorephase. Subsequently, we theorize the structural constraint byintroducing the concept of super-structure S, which is anundirected graph that restricts the search to networks whose skeletonis a subgraph of S. We develop a super-structure constrainedoptimal search (COS): its time complexity is upper bounded byO(γmn), whereγm<2 depends on the maximal degree m ofS. Empirically, complexity depends on the average degreem-tilde and sparse structures allow larger graphs to becalculated. Our algorithm is faster than an optimal search by severalorders and even finds more accurate results when given a soundsuper-structure. Practically, S can be approximated by ITapproaches; significance level of the tests controls its sparseness,enabling to control the trade-off between speed and accuracy. Forincomplete super-structures, a greedily post-processed version (COS+)still enables to significantly outperform other heuristic searches.

データからベイジアンネットワーク構造を学習するために使用される従来のアプローチには、複雑さと結果の精度が低いという欠点があります。ただし、最近の実証的研究では、ハイブリッドアルゴリズムによって精度と速度が大幅に向上することが示されています。このアルゴリズムは、独立性テスト(IT)アプローチを使用してスケルトンを学習し、検索とスコア付けフェーズで考慮される有向非巡回グラフ(DAG)に制約を課します。次に、スケルトンがSのサブグラフであるネットワークに検索を制限する無向グラフであるスーパー構造Sの概念を導入して、構造的制約を理論化します。スーパー構造制約付き最適検索(COS)を開発します。その時間複雑度の上限はO(γmn)です。ここで、γm<2はSの最大次数mに依存します。経験的に、複雑度は平均次数mに依存します。チルダとスパース構造により、より大きなグラフを計算できます。このアルゴリズムは、最適検索よりも数桁高速で、健全なスーパー構造が与えられた場合は、より正確な結果を見つけます。実際には、SはITアプローチによって近似できます。テストの有意水準によってそのスパース性が制御され、速度と精度のトレードオフを制御できます。不完全なスーパー構造の場合、貪欲に後処理されたバージョン(COS+)でも、他のヒューリスティック検索を大幅に上回るパフォーマンスが得られます。

Forecasting Web Page Views: Methods and Observations
Web ページビューの予測: 方法と観察

Web sites must forecast Web page views in order to plan computerresource allocation and estimate upcoming revenue and advertisinggrowth. In this paper, we focus on extracting trends and seasonalpatterns from page view series, two dominant factors in the variationof such series. We investigate the Holt-Winters procedure and a statespace model for making relatively short-term prediction. It is foundthat Web page views exhibit strong impulsive changes occasionally.The impulses cause large prediction errors long after theiroccurrences. A method is developed to identify impulses and toalleviate their damage on prediction. We also develop a long-rangetrend and season extraction method, namely the Elastic SmoothSeason Fitting (ESSF) algorithm, to compute scalable and smoothyearly seasons. ESSF derives the yearly season by minimizing theresidual sum of squares under smoothness regularization, a quadraticoptimization problem. It is shown that for long-term prediction, ESSFimproves accuracy significantly over other methods that ignore theyearly seasonality.

私たちは、Webサイトは、コンピュータリソースの割り当てを計画し、今後の収益と広告の成長を予測するために、Webページビューを予測する必要があります。この論文では、ページビューシリーズから傾向と季節パターンを抽出することに焦点を当てています。これらは、このようなシリーズの変動における2つの主要な要因です。比較的短期的な予測を行うためのHolt-Winters手順と状態空間モデルを調査します。Webページビューは、時折、強い衝動的な変化を示すことがわかりました。この衝動は、発生してからかなり経ってから大きな予測エラーを引き起こします。衝動を識別し、予測に対するそのダメージを軽減する方法が開発されました。また、スケーラブルで滑らかな年間シーズンを計算するために、長期的な傾向と季節の抽出方法、つまりElastic SmoothSeason Fitting (ESSF)アルゴリズムも開発しました。ESSFは、平滑性正規化(2次最適化問題)の下で残差平方和を最小化することによって年間シーズンを導き出します。長期予測では、ESSFは年間の季節性を無視する他の方法よりも精度が大幅に向上することが示されています。

Ranking Individuals by Group Comparisons
グループ比較による個人のランク付け

This paper proposes new approaches to rank individuals from theirgroup comparison results. Many real-world problems are of thistype. For example, ranking players from team comparisons is importantin some sports. In machine learning, a closely related application isclassification using coding matrices. Group comparison results areusually in two types: binary indicator outcomes (wins/losses) ormeasured outcomes (scores). For each type of results, we propose newmodels for estimating individuals’ abilities, and hence a ranking ofindividuals. The estimation is carried out by solving convexminimization problems, for which we develop easy and efficientsolution procedures. Experiments on real bridge records andmulti-class classification demonstrate the viability of the proposedmodels.

この論文では、グループの比較結果から個人をランク付けするための新しいアプローチを提案します。現実世界の多くの問題は、このタイプです。たとえば、チームの比較から選手をランク付けすることは、一部のスポーツでは重要です。機械学習では、密接に関連するアプリケーションがコーディング行列を使用した分類です。グループ比較の結果には、通常、バイナリ指標の結果(勝敗)と測定結果(スコア)の2つのタイプがあります。結果の種類ごとに、個人の能力を推定するための新しいモデル、したがって個人のランキングを提案します。推定は、凸極化問題を解くことによって行われ、そのための簡単で効率的な解法を開発します。実際の橋梁記録と多クラス分類に関する実験は、提案されたモデルの実行可能性を示しています。

A Moment Bound for Multi-hinge Classifiers
マルチヒンジ分類器の束縛モーメント

The success of support vector machines in binary classification relies onthe fact that hinge loss employed in the risk minimization targets theBayes rule. Recent research explores some extensions of this large marginbased method to the multicategory case. We show a moment bound forthe so-called multi-hinge loss minimizers based ontwo kinds of complexity constraints: entropy with bracketing and empiricalentropy. Obtaining such a result based on the latter is harder than findingone based on the former. We obtain fast rates of convergence that adapt to the unknown margin.

二項分類におけるサポートベクターマシンの成功は、リスク最小化で使用されるヒンジ損失がベイズの法則を対象としているという事実に依存しています。最近の研究では、この大きなマージンベースの方法をマルチカテゴリのケースに拡張する方法が調査されています。私たちは、ブラケットによるエントロピーと経験的エントロピーという2種類の複雑さの制約に基づく、いわゆるマルチヒンジ損失最小化のモーメント限界を示す。後者に基づいてそのような結果を得ることは、前者に基づいて1つを見つけるよりも困難です。未知のマージンに適応する収束速度が速くなります。

HPB: A Model for Handling BN Nodes with High Cardinality Parents
HPB: 高カーディナリティの親を持つ BN ノードを処理するためのモデル

We replaced the conditional probability tables of Bayesian networknodes whose parents have high cardinality with a multilevel empiricalhierarchical Bayesian model called hierarchical pattern Bayes (HPB).The resulting Bayesian networks achieved significant performanceimprovements over Bayesian networks with the same structure andtraditional conditional probability tables, over Bayesian networkswith simpler structures like naïve Bayes and tree augmented naïveBayes, over Bayesian networks where traditional conditionalprobability tables were substituted by noisy-OR gates, default tables,decision trees and decision graphs and over Bayesian networksconstructed after a cardinality reduction preprocessing phase usingthe agglomerative information bottleneck method. Our main tests tookplace in important fraud detection domains, which are characterized bythe presence of high cardinality attributes and by the existence ofrelevant interactions among them. Other tests, over UCI data sets,show that HPB may have a quite wide applicability.

私たちは、親が高カーディナリティであるベイジアンネットワークノードの条件付き確率テーブルを、階層パターンベイズ(HPB)と呼ばれるマルチレベルの経験的階層ベイジアンモデルに置き換えました。結果として得られたベイジアンネットワークは、同じ構造と従来の条件付き確率テーブルを持つベイジアンネットワーク、ナイーブベイズやツリー拡張ナイーブベイズなどのより単純な構造を持つベイジアンネットワーク、従来の条件付き確率テーブルをノイズORゲート、デフォルトテーブル、決定木、決定グラフに置き換えたベイジアンネットワーク、および凝集情報ボトルネック法を使用してカーディナリティ削減前処理フェーズ後に構築されたベイジアンネットワークに比べて、パフォーマンスが大幅に向上しました。主なテストは、高カーディナリティ属性の存在とそれらの間の関連する相互作用の存在を特徴とする重要な不正検出ドメインで実施しました。UCIデータセットに対するその他のテストでは、HPBの適用範囲がかなり広い可能性があることが示されています。

Gradient Tree Boosting for Training Conditional Random Fields
条件付きランダムフィールドを訓練するための勾配木ブースティング

Conditional random fields (CRFs) provide a flexible and powerful modelfor sequence labeling problems. However, existing learning algorithmsare slow, particularly in problems with large numbers of potentialinput features and feature combinations. This paper describes a newalgorithm for training CRFs via gradient tree boosting. In treeboosting, the CRF potential functions are represented as weighted sumsof regression trees, which provide compact representations of featureinteractions. So the algorithm does not explicitly consider thepotentially large parameter space. As a result, gradient treeboosting scales linearly in the order of the Markov model and in theorder of the feature interactions, rather than exponentially as inprevious algorithms based on iterative scaling and gradient descent.Gradient tree boosting also makes it possible to use instanceweighting (as in C4.5) and surrogate splitting (as in CART) to handlemissing values. Experimental studies of the effectiveness of thesetwo methods (as well as standard imputation and indicator featuremethods) show that instance weighting is the best method in most caseswhen feature values are missing at random.

条件付きランダムフィールド(CRF)は、シーケンスのラベル付け問題に柔軟で強力なモデルを提供します。ただし、既存の学習アルゴリズムは、特に潜在的な入力機能と機能の組み合わせが多数ある問題では低速です。この論文では、勾配ツリーブースティングを介してCRFをトレーニングするための新しいアルゴリズムについて説明します。ツリーブースティングでは、CRFの潜在的関数は、機能の相互作用のコンパクトな表現を提供する回帰ツリーの加重和として表されます。そのため、アルゴリズムは、潜在的に大きいパラメーター空間を明示的に考慮しません。その結果、勾配ツリーブースティングは、反復スケーリングと勾配降下法に基づく以前のアルゴリズムのように指数関数的にではなく、マルコフモデルの順序と機能の相互作用の順序で線形にスケーリングされます。勾配ツリーブースティングでは、インスタンス重み付け(C4.5の場合)とサロゲート分割(CARTの場合)を使用して欠損値を処理することもできます。これら2つの方法(および標準補完法と指標特徴法)の有効性に関する実験的研究によると、特徴値がランダムに欠落している場合、ほとんどの場合、インスタンスの重み付けが最適な方法であることが示されています。

Value Function Approximation using Multiple Aggregation for Multiattribute Resource Management
多属性リソース管理のための複数集計を使用した値関数近似

We consider the problem of estimating the value of a multiattributeresource, where the attributes are categorical or discrete in natureand the number of potential attribute vectors is very large. Theproblem arises in approximate dynamic programming when we need toestimate the value of a multiattribute resource from estimates basedon Monte-Carlo simulation. These problems have been traditionallysolved using aggregation, but choosing the right level of aggregationrequires resolving the classic tradeoff between aggregation error andsampling error. We propose a method that estimates the value of aresource at different levels of aggregation simultaneously, and thenuses a weighted combination of the estimates. Using the optimalweights, which minimizes the variance of the estimate while accountingfor correlations between the estimates, is computationally tooexpensive for practical applications. We have found that a simpleinverse variance formula (adjusted for bias), which effectivelyassumes the estimates are independent, produces near-optimalestimates. We use the setting of two levels of aggregation to explainwhy this approximation works so well.

私たちは、属性が本質的にカテゴリ型または離散型であり、潜在的な属性ベクトルの数が非常に多い、マルチ属性リソースの価値を推定する問題について考えます。この問題は、モンテカルロシミュレーションに基づく推定値からマルチ属性リソースの価値を推定する必要がある場合、近似動的計画法で発生します。これらの問題は、従来は集約を使用して解決されてきましたが、適切な集約レベルを選択するには、集約エラーとサンプリングエラーの間の従来のトレードオフを解決する必要があります。私たちは、異なる集約レベルでリソースの価値を同時に推定し、推定値の重み付けされた組み合わせを使用する方法を提案します。推定値の分散を最小化すると同時に推定値間の相関関係を考慮する最適な重みを使用することは、実際のアプリケーションでは計算コストが高すぎます。私たちは、推定値が独立していると実質的に想定する単純な逆分散式(バイアス調整済み)が、ほぼ最適な推定値を生成することを発見しました。この近似が非常にうまく機能する理由を説明するために、2つの集約レベルの設定を使用します。

Approximations for Binary Gaussian Process Classification
バイナリガウス過程分類の近似

We provide a comprehensive overview of many recent algorithmsfor approximate inference in Gaussian process models for probabilisticbinary classification. The relationships between several approachesare elucidated theoretically, and the properties of the differentalgorithms are corroborated by experimental results. We examine both1) the quality of the predictive distributions and 2) the suitabilityof the different marginal likelihood approximations for model selection(selecting hyperparameters) and compare to a gold standard based onMCMC. Interestingly, some methods produce good predictive distributionsalthough their marginal likelihood approximations are poor. Strongconclusions are drawn about the methods: The Expectation Propagationalgorithm is almost always the method of choice unless the computationalbudget is very tight. We also extend existing methods in various ways,and provide unifying code implementing all approaches.

私たちは、確率的二項分類のためのガウス過程モデルにおける近似推論のための多くの最近のアルゴリズムの包括的な概要を提供します。いくつかのアプローチ間の関係は理論的に解明されており、異なるアルゴリズムの特性は実験結果によって裏付けられています。1)予測分布の品質、および2)モデル選択(ハイパーパラメータの選択)に対するさまざまな周辺尤度近似の適合性の両方を調べ、MCMCに基づくゴールドスタンダードと比較します。興味深いことに、一部の方法では良好な予測分布が生成されますが、周辺尤度近似は不十分です。その方法について強い結論が導き出されます: Expectation Propagationアルゴリズムは、計算バジェットが非常に厳しい場合を除いて、ほとんどの場合、選択される方法です。また、既存のメソッドをさまざまな方法で拡張し、すべてのアプローチを実装する統一コードを提供します。

Consistency of Random Forests and Other Averaging Classifiers
ランダムフォレストと他の平均化分類子の一貫性

In the last years of his life, Leo Breiman promoted random forests foruse in classification. He suggested using averaging as a means ofobtaining good discrimination rules. The base classifiers used foraveraging are simple and randomized, often based on random samplesfrom the data. He left a few questions unanswered regarding theconsistency of such rules. In this paper, we give a number of theoremsthat establish the universal consistency of averaging rules. We alsoshow that some popular classifiers, including one suggested byBreiman, are not universally consistent.

彼の人生の最後の年に、レオブライマンは分類でランダムな森林foruseを推進しました。彼は、適切な識別ルールを取得する手段として平均化を使用することを提案しました。平均化に使用される基本分類器は単純でランダム化されており、多くの場合、データからのランダムなサンプルに基づいています。彼は、そのようなルールの一貫性に関して、いくつかの疑問を未解決のまま残しました。この論文では、平均化ルールの普遍的な一貫性を確立するいくつかの定理を示します。また、ブライマンによって提案されたものを含むいくつかの一般的な分類子は、普遍的に一貫していないことも示しています。

Mixed Membership Stochastic Blockmodels
混合メンバーシップストキャスティクスブロックモデル

Consider data consisting of pairwise measurements, such as presence orabsence of links between pairs of objects. These data arise, forinstance, in the analysis of protein interactions and gene regulatorynetworks, collections of author-recipient email, and social networks.Analyzing pairwise measurements with probabilistic models requiresspecial assumptions, since the usual independence or exchangeabilityassumptions no longer hold. Here we introduce a class of varianceallocation models for pairwise measurements: mixed membershipstochastic blockmodels. These models combine global parameters thatinstantiate dense patches of connectivity (blockmodel) with localparameters that instantiate node-specific variability in theconnections (mixed membership). We develop a general variationalinference algorithm for fast approximate posterior inference. Wedemonstrate the advantages of mixed membership stochastic blockmodelswith applications to social networks and protein interaction networks.

ペアワイズ測定で構成されるデータ(オブジェクトのペア間のリンクの有無など)について考えてみます。これらのデータは、例えば、タンパク質相互作用や遺伝子調節ネットワークの解析、著者と受信者の電子メールのコレクション、ソーシャルネットワークなどで発生します。確率モデルを使用してペアワイズ測定値を分析するには、通常の独立性や交換可能性の仮定が成り立たなくなるため、特別な仮定が必要です。ここでは、ペアワイズ測定の分散配分モデルのクラスである混合メンバーシップ確率的ブロックモデルを紹介します。これらのモデルは、接続性の密集したパッチをインスタンス化するグローバルパラメーター(blockmodel)と、接続のノード固有の変動性をインスタンス化するローカルパラメーター(混合メンバーシップ)を組み合わせます。高速近似事後推論のための一般的な変分推論アルゴリズムを開発します。私たちは、ソーシャルネットワークやタンパク質相互作用ネットワークへの応用を備えた混合メンバーシップ確率的ブロックモデルの利点を実証します。

Complete Identification Methods for the Causal Hierarchy
因果階層の完全な識別方法

We consider a hierarchy of queries about causal relationships ingraphical models, where each level in the hierarchyrequires more detailed information than the one below.The hierarchy consists of threelevels: associative relationships, derived from a joint distributionover the observable variables;cause-effect relationships, derived from distributions resultingfrom external interventions; andcounterfactuals, derived from distributions that span multiple”parallel worlds” and resulting from simultaneous, possiblyconflicting observations and interventions.We completely characterize cases where a given causal query can be computedfrom information lower in the hierarchy, and provide algorithms thataccomplish this computation. Specifically, we show when effects ofinterventions can be computed from observational studies, and whenprobabilities of counterfactuals can be computed from experimentalstudies. We also provide a graphical characterization of those queries whichcannot be computed (by any method) from queries at a lower layer of thehierarchy.

私たちは、グラフィカルモデルにおける因果関係に関するクエリの階層について検討します。階層の各レベルには、下位レベルよりも詳細な情報が必要です。階層は、観測可能な変数の結合分布から得られる関連関係、外部介入の結果の分布から得られる因果関係、および複数の「並行世界」にまたがり、同時に発生する、場合によっては矛盾する観測と介入の結果の分布から得られる反事実の3つのレベルで構成されます。特定の因果クエリを下位階層の情報から計算できるケースを完全に特徴付け、この計算を実行するアルゴリズムを提供します。具体的には、介入の効果を観察研究から計算できる場合と、反事実の確率を実験研究から計算できる場合を示します。また、下位階層のクエリから(どのような方法でも)計算できないクエリのグラフィカルな特徴付けも提供します。

Manifold Learning: The Price of Normalization
多様体学習:正規化の代償

We analyze the performance of a class of manifold-learningalgorithms that find their output by minimizing a quadratic formunder some normalization constraints. This class consists ofLocally Linear Embedding (LLE), Laplacian Eigenmap, Local TangentSpace Alignment (LTSA), Hessian Eigenmaps (HLLE), and Diffusionmaps. We present and prove conditions on the manifold that arenecessary for the success of the algorithms. Both the finitesample case and the limit case are analyzed. We show that thereare simple manifolds in which the necessary conditions areviolated, and hence the algorithms cannot recover the underlyingmanifolds. Finally, we present numerical results that demonstrateour claims.

私たちは、いくつかの正規化制約の下で二次形式を最小化することによって出力を見つける多様体学習アルゴリズムのクラスのパフォーマンスを分析します。このクラスは、Locally Linear Embedding(LLE)、Laplacian Eigenmap、Local TangentSpace Alignment (LTSA)、Hessian Eigenmaps (HLLE)、およびDiffusionmapsで構成されます。アルゴリズムの成功に必要な多様体上の条件を提示し、証明します。有限サンプルの場合と極限の場合の両方が分析されます。私たちは、必要な条件が破られた単純な多様体が存在するため、アルゴリズムは基礎となる多様体を回復することができないことを示す。最後に、私たちの主張を実証する数値結果を示します。

On Relevant Dimensions in Kernel Feature Spaces
カーネル特徴空間の関連次元について

We show that the relevant information of a supervised learning problemis contained up to negligible error in a finite number of leadingkernel PCA components if the kernel matches the underlying learningproblem in the sense that it can asymptotically represent the functionto be learned and is sufficiently smooth. Thus, kernels do not onlytransform data sets such that good generalization can be achievedusing only linear discriminant functions, but this transformation isalso performed in a manner which makes economical use of feature spacedimensions. In the best case, kernels provide efficient implicitrepresentations of the data for supervised learning problems.Practically, we propose an algorithm which enables us to recover thenumber of leading kernel PCA components relevant for goodclassification. Our algorithm can therefore be applied (1) to analyzethe interplay of data set and kernel in a geometric fashion, (2) toaid in model selection, and (3) to denoise in feature space in orderto yield better classification results.

私たちは、カーネルが学習すべき関数を漸近的に表現でき、十分に滑らかであるという意味で基礎学習問題に一致する場合、教師あり学習問題の関連情報は、無視できる誤差まで、主要なカーネルPCAコンポーネントの有限数に含まれることを示します。したがって、カーネルは、線形判別関数のみを使用して優れた一般化を達成できるようにデータセットを変換するだけでなく、この変換は、特徴空間次元を経済的に使用する方法でも実行されます。最良の場合、カーネルは、教師あり学習問題に対してデータの効率的な暗黙的表現を提供します。実際には、優れた分類に関連する主要なカーネルPCAコンポーネントの数を回復できるアルゴリズムを提案します。したがって、このアルゴリズムは、(1)データセットとカーネルの相互作用を幾何学的に分析するため、(2)モデル選択を支援するため、(3)より優れた分類結果を得るために特徴空間でノイズを除去するために適用できます。

LIBLINEAR: A Library for Large Linear Classification
LIBLINEAR: 大規模線形分類のライブラリ

LIBLINEAR is an open source library for large-scale linearclassification. It supports logistic regression and linear supportvector machines. We provide easy-to-use command-line tools andlibrary calls for users and developers. Comprehensive documents are available for both beginners and advancedusers. Experiments demonstrate that LIBLINEAR is very efficient onlarge sparse data sets.

LIBLINEARは、大規模な線形分類のためのオープンソースライブラリです。ロジスティック回帰と線形サポートベクトルマシンをサポートします。ユーザーと開発者向けに、使いやすいコマンドラインツールとライブラリ呼び出しを提供します。包括的なドキュメントは、初心者と上級者の両方で利用できます。実験では、LIBLINEARが大規模なスパースデータセットに対して非常に効率的であることが示されています。

Learning Balls of Strings from Edit Corrections
Edit Correctionsからの弦のボールの学習

When facing the question of learning languages in realistic settings,one has to tackle several problems that do not admit simplesolutions. On the one hand, languages are usually defined by complexgrammatical mechanisms for which the learning results arepredominantly negative, as the few algorithms are not really able tocope with noise. On the other hand, the learning settings themselvesrely either on too simple information (text) or on unattainable one(query systems that do not exist in practice, nor can besimulated). We consider simple but sound classes of languages definedvia the widely used edit distance: the balls of strings. We propose tolearn them with the help of a new sort of queries, called thecorrection queries: when a string is submitted to the Oracle, eithershe accepts it if it belongs to the target language, or she proposes acorrection, that is, a string of the language close to the query withrespect to the edit distance. We show that even if the good balls arenot learnable in Angluin’s MAT model, they can be learnedfrom a polynomial number of correction queries. Moreover, experimentalevidence simulating a human Expert shows that this algorithm isresistant to approximate answers.

現実的な設定で言語を学習するという問題に直面すると、単純な解決法が認められないいくつかの問題に取り組む必要があります。一方では、言語は通常、複雑な文法メカニズムによって定義されますが、その学習結果は主に否定的なものになります。これは、少数のアルゴリズムが実際にはノイズに対処できないためです。他方では、学習設定自体が、単純すぎる情報(テキスト)または達成不可能な情報(実際には存在せず、シミュレートもできないクエリシステム)に依存しています。私たちは、広く使用されている編集距離によって定義される、単純だが健全な言語クラス、つまり文字列のボールについて検討します。私たちは、修正クエリと呼ばれる新しい種類のクエリの助けを借りて、それらを学習することを提案します。文字列がOracleに送信されると、Oracleは、それがターゲット言語に属している場合はそれを受け入れるか、または、編集距離に関してクエリに近い言語の文字列である修正を提案します。良いボールがAngluinのMATモデルで学習できない場合でも、多項式数の訂正クエリから学習できることを示します。さらに、人間のエキスパートをシミュレートする実験的証拠は、このアルゴリズムが近似回答に耐性があることを示しています。

Classification with a Reject Option using a Hinge Loss
ヒンジ損失を使用したリジェクトオプションによる分類

We consider the problem of binary classification where theclassifier can, for a particular cost, choose not to classify anobservation. Just as in the conventional classification problem,minimization of the sample average of the cost is a difficultoptimization problem. As an alternative, we propose the optimizationof a certain convex loss function φ, analogous to the hingeloss used in support vector machines (SVMs). Its convexity ensuresthat the sample average of this surrogate loss can be efficientlyminimized. We study its statistical properties. We show thatminimizing the expected surrogate lossthe φ-riskalsominimizes the risk. We also study the rate at which the φ-riskapproaches its minimum value. We show that fast rates are possiblewhen the conditional probability P(Y=1|X) is unlikely to beclose to certain critical values.

私たちは、分類器が特定のコストで観測を分類しないことを選択できる二項分類の問題を考えます。従来の分類問題と同様に、コストのサンプル平均を最小化することは難しい最適化問題です。別の方法として、サポートベクターマシン(SVM)で使用されるヒンジ損失に類似した、特定の凸損失関数φの最適化を提案します。その凸性により、この代理損失のサンプル平均を効率的に最小化できます。私たちはその統計的特性を研究します。私たちはそれを示します予想される代理損失を最小限に抑えるφリスクまた、リスクを最小限に抑えます。また、φリスクが最小値に近づく速度についても調査します。条件付き確率P(Y=1|X)が特定の臨界値に近づく可能性は低いです。

Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks
条件付き確率場と最大余裕マルコフネットワークのためのべき乗勾配アルゴリズム

Log-linear and maximum-margin models are two commonly-used methods insupervised machine learning, and are frequently used in structuredprediction problems. Efficient learning of parameters in these modelsis therefore an important problem, and becomes a key factor whenlearning from very large data sets. This paper describesexponentiated gradient (EG) algorithms for training such models, whereEG updates are applied to the convex dual of either the log-linear ormax-margin objective function; the dual in both the log-linear andmax-margin cases corresponds to minimizing a convex function withsimplex constraints. We study both batch and online variants of thealgorithm, and provide rates of convergence for both cases. In themax-margin case, O(1/ε) EG updates are required toreach a given accuracy ε in the dual; in contrast, forlog-linear models only O(log(1/ε)) updates arerequired. For both the max-margin and log-linear cases, our boundssuggest that the online EG algorithm requires a factor of n lesscomputation to reach a desired accuracy than the batch EG algorithm,where n is the number of training examples. Our experiments confirmthat the online algorithms are much faster than the batch algorithmsin practice. We describe how the EG updates factor in a convenientway for structured prediction problems, allowing the algorithms to beefficiently applied to problems such as sequence learning or naturallanguage parsing. We perform extensive evaluation of the algorithms,comparing them to L-BFGS and stochastic gradient descent forlog-linear models, and to SVM-Struct for max-marginmodels. The algorithms are applied to a multi-classproblem as well as to a more complex large-scale parsing task. In allthese settings, the EG algorithms presented here outperform the othermethods.

対数線形モデルと最大マージンモデルは、教師あり機械学習でよく使用される2つの方法であり、構造化予測問題で頻繁に使用されます。したがって、これらのモデルのパラメータの効率的な学習は重要な問題であり、非常に大規模なデータセットから学習する場合の重要な要素になります。この論文では、このようなモデルをトレーニングするための指数勾配(EG)アルゴリズムについて説明します。このアルゴリズムでは、EG更新が対数線形または最大マージン目的関数の凸双対に適用されます。対数線形と最大マージンのどちらの場合も、双対は単体制約を持つ凸関数の最小化に対応します。アルゴリズムのバッチバリアントとオンラインバリアントの両方を調査し、両方のケースの収束率を示します。最大マージンの場合、双対で特定の精度 ε に到達するには、O(1/ε)回のEG更新が必要です。対照的に、対数線形モデルの場合は、O(log(1/ε))回の更新のみが必要です。最大マージンと対数線形の両方のケースにおいて、オンラインEGアルゴリズムではバッチEGアルゴリズムよりもn倍少ない計算量で目的の精度に達することが境界から示唆されます(nはトレーニング例の数)。実験により、オンラインアルゴリズムは実際にはバッチアルゴリズムよりもはるかに高速であることが確認されました。EGが構造化予測問題に対して便利な方法で係数を更新し、シーケンス学習や自然言語解析などの問題にアルゴリズムを効率的に適用できるようにする方法について説明します。アルゴリズムの広範な評価を実行し、対数線形モデルの場合はL-BFGSおよび確率的勾配降下法、最大マージンモデルの場合はSVM-Structと比較します。アルゴリズムは、マルチクラス問題だけでなく、より複雑な大規模解析タスクにも適用されます。これらすべての設定において、ここで紹介するEGアルゴリズムは他の方法よりも優れています。

Learning from Multiple Sources
複数のソースから学ぶ

We consider the problem of learning accurate models from multiplesources of “nearby” data. Given distinct samples from multiple datasources and estimates of the dissimilarities between these sources, weprovide a general theory of which samples should be used to learnmodels for each source. This theory is applicable in a broaddecision-theoretic learning framework, and yields general results forclassification and regression. A key component of our approach is thedevelopment of approximate triangle inequalities for expected loss,which may be of independent interest. We discuss the related problemof learning parameters of a distribution from multiple data sources.Finally, we illustrate our theory through a series of syntheticsimulations.

私たちは、「近くの」データの複数のソースから正確なモデルを学習する問題を考えます。複数のデータソースからの異なるサンプルと、これらのソース間の非類似性の推定値を考慮して、各ソースのモデルを学習するためにどのサンプルを使用すべきかについての一般的な理論を提供します。この理論は、広範な意思決定理論の学習フレームワークに適用でき、分類と回帰の一般的な結果をもたらします。私たちのアプローチの重要な要素は、予想損失の近似三角形の不等式の開発であり、これは独立した関心事である可能性があります。複数のデータソースからの分布のパラメータの学習に関連する問題について説明します。最後に、一連の合成シミュレーションを通じて理論を説明します。

Nearly Uniform Validation Improves Compression-Based Error Bounds
ほぼ統一された検証により、圧縮ベースの誤差範囲が改善

This paper develops bounds on out-of-sample error rates for supportvector machines (SVMs). The bounds are based on the numbers of supportvectors in the SVMs rather than on VC dimension. The bounds developedhere improve on support vector counting bounds derived usingLittlestone and Warmuth’s compression-based bounding technique.

この論文では、supportvectorマシン(SVM)のサンプル外エラー率の限界を開発します。境界は、VC次元ではなく、SVM内のサポートベクトルの数に基づいています。ここで開発された境界は、LittlestoneとWarmuthの圧縮ベースの境界手法を使用して導出されたサポートベクトルカウント境界を改善します。

Regularization on Graphs with Function-adapted Diffusion Processes
関数適応拡散過程を用いたグラフ上の正則化

Harmonic analysis and diffusion on discrete data has been shown tolead to state-of-the-art algorithms for machine learning tasks,especially in the context of semi-supervised and transductivelearning. The success of these algorithms rests on the assumptionthat the function(s) to be studied (learned, interpolated, etc.) aresmooth with respect to the geometry of the data. In this paper wepresent a method for modifying the given geometry so the function(s)to be studied are smoother with respect to the modified geometry,and thus more amenable to treatment using harmonic analysis methods.Among the many possible applications, we consider the problems ofimage denoising and transductive classification. In both settings,our approach improves on standard diffusion based methods.

離散データに対する調和解析と拡散は、特に半教師あり学習と変換学習のコンテキストで、機械学習タスクの最先端のアルゴリズムにつながることが示されています。これらのアルゴリズムの成功は、調査対象の関数(学習、補間など)がデータの形状に対して滑らかであるという仮定に基づいています。この論文では、与えられたジオメトリを変更して、研究対象の関数が変更されたジオメトリに対してより滑らかになり、調和解析法を使用した処理により適するようにする方法を紹介します。多くの可能なアプリケーションの中で、画像のノイズ除去と変換分類の問題を検討します。どちらの設定でも、私たちのアプローチは標準的な拡散ベースの方法を改善します。

Value Function Based Reinforcement Learning in Changing Markovian Environments
変化するマルコフ環境における価値関数に基づく強化学習

The paper investigates the possibility of applying value functionbased reinforcement learning (RL) methods in cases when theenvironment may change over time. First, theorems are presented whichshow that the optimal value function of a discounted Markov decisionprocess (MDP) Lipschitz continuously depends on the immediate-costfunction and the transition-probability function. Dependence on thediscount factor is also analyzed and shown to benon-Lipschitz. Afterwards, the concept of (ε,δ)-MDPsis introduced, which is a generalization of MDPs andε-MDPs. In this model the environment may change overtime, more precisely, the transition function and the cost functionmay vary from time to time, but the changes must be bounded in thelimit. Then, learning algorithms in changing environments areanalyzed. A general relaxed convergence theorem for stochasticiterative algorithms is presented. We also demonstrate the resultsthrough three classical RL methods: asynchronous value iteration,Q-learning and temporal difference learning. Finally, some numericalexperiments concerning changing environments are presented.

この論文では、環境が時間とともに変化する可能性がある場合に、価値関数ベースの強化学習(RL)方法を適用する可能性について調査します。まず、割引マルコフ決定プロセス(MDP) Lipschitzの最適価値関数が、即時コスト関数と遷移確率関数に継続的に依存することを示す定理を示します。割引係数への依存も分析され、非Lipschitzであることが示されます。その後、MDPと ε-MDPの一般化である(ε,δ)-MDPの概念が導入されます。このモデルでは、環境が時間とともに変化する可能性があります。より正確には、遷移関数とコスト関数は時々変化する可能性がありますが、変化は制限内に制限される必要があります。次に、変化する環境での学習アルゴリズムを分析します。確率的反復アルゴリズムの一般的な緩和収束定理を示します。また、非同期価値反復、Q学習、および時間差学習という3つの古典的なRL方法による結果を示します。最後に、変化する環境に関するいくつかの数値実験を紹介します。

A New Algorithm for Estimating the Effective Dimension-Reduction Subspace
有効次元縮小部分空間を推定するための新しいアルゴリズム

The statistical problem of estimating the effectivedimension-reduction (EDR) subspace in the multi-index regression modelwith deterministic design and additive noise is considered. A newprocedure for recovering the directions of the EDR subspace isproposed. Many methods for estimating the EDR subspace performprincipal component analysis on a family of vectors, sayβ1,…,βL, nearly lying in the EDRsubspace. This is in particular the case for the structure-adaptiveapproach proposed by Hristache et al. (2001a). In the present work, we propose toestimate the projector onto the EDR subspace by the solution to theoptimization problem minimize maxl=1,…,LβlT (I-A)βl subject to A ∈ Amwhere Am is the set of allsymmetric matrices with eigenvalues in [0,1] and trace less than orequal to m, with m being the true structural dimension. Undermild assumptions, √n-consistency of the proposed procedure isproved (up to a logarithmic factor) in the case when the structuraldimension is not larger than 4. Moreover, the stochastic error ofthe estimator of the projector onto the EDR subspace is shown todepend on L logarithmically. This enables us to use a large numberof vectors βl for estimating the EDR subspace. Theempirical behavior of the algorithm is studied through numericalsimulations.

決定論的設計と加法性ノイズを伴うマルチインデックス回帰モデルにおける有効次元削減(EDR)サブスペースを推定する統計的問題について検討します。EDRサブスペースの方向を回復する新しい手順を提案します。EDRサブスペースを推定する多くの方法では、EDRサブスペースにほぼ含まれるベクトルファミリ(β1、…、βLなど)に対して主成分分析を実行します。これは特に、Hristacheら(2001a)が提案した構造適応型アプローチの場合に当てはまる。この研究では、最適化問題minimum maxl=1,…,LβlT (I-A)βl subject to A∈Amの解によって、EDRサブスペースへの射影を推定することを提案します。ここで、Amは、固有値が[0,1]にあり、トレースがm以下(mは真の構造次元)である全対称行列の集合です。軽度の仮定の下では、構造次元が4より大きくない場合、提案された手順の √n一貫性が(対数係数まで)証明されています。さらに、EDRサブスペースへの射影子の推定値の確率誤差は、Lに対数的に依存することが示されています。これにより、EDRサブスペースを推定するために多数のベクトル βlを使用できます。アルゴリズムの実験的動作は、数値シミュレーションによって研究されています。

Universal Multi-Task Kernels
ユニバーサルマルチタスクカーネル

In this paper we are concerned with reproducing kernel Hilbertspaces HK of functions from an input space into a Hilbertspace Y, an environment appropriate for multi-tasklearning. The reproducing kernel K associated to HK hasits values as operators on Y. Our primary goal here is toderive conditions which ensure that the kernel K is universal.This means that on every compact subset of the input space, everycontinuous function with values in Y can be uniformlyapproximated by sections of the kernel. We provide variouscharacterizations of universal kernels and highlight them withseveral concrete examples of some practical importance. Our analysisuses basic principles of functional analysis and especially theuseful notion of vector measures which we describe in sufficientdetail to clarify our results.

この論文では、入力空間からマルチタスク学習に適した環境であるヒルベルトスペースYへの関数のカーネルヒルベルトスペースHKの再現に関心があります。HKに関連付けられた再生カーネルKは、Y上の演算子として値を持ちます。ここでの主な目標は、カーネルKが普遍であることを保証する条件を導出することです。これは、入力空間のすべてのコンパクトなサブセットで、Yの値を持つeverycontinuous関数をカーネルのセクションによって一様に近似できることを意味します。ユニバーサルカーネルのさまざまな特性を提供し、いくつかの実用的な重要性を持ついくつかの具体的な例でそれらを強調します。私たちの分析は、関数分析の基本原則、特に結果を明確にするために十分に詳細に説明するベクトル測度の有用な概念を使用します。

Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction
統合Webデータ抽出のための動的階層マルコフ確率場

Existing template-independent web data extraction approaches adopthighly ineffective decoupled strategies—attempting to do datarecord detection and attribute labeling in two separate phases. Inthis paper, we propose an integrated web data extraction paradigmwith hierarchical models. The proposed model is called DynamicHierarchical Markov Random Fields (DHMRFs). DHMRFs take structuraluncertainty into consideration and define a joint distribution ofboth model structure and class labels. The joint distribution is anexponential family distribution. As a conditional model, DHMRFsrelax the independence assumption as made in directed models. Sinceexact inference is intractable, a variational method is developed tolearn the model’s parameters and to find the MAP model structure andlabel assignments. We apply DHMRFs to a real-world web dataextraction task. Experimental results show that: (1) integrated webdata extraction models can achieve significant improvements on bothrecord detection and attribute labeling compared to decoupledmodels; (2) in diverse web data extraction DHMRFs can potentiallyaddress the blocky artifact issue which is suffered byfixed-structured hierarchical models.

既存のテンプレートに依存しないWebデータ抽出アプローチは、非常に非効率的な分離戦略を採用しており、データレコードの検出と属性のラベル付けを2つの別々のフェーズで実行しようとしています。この論文では、階層モデルを使用した統合Webデータ抽出パラダイムを提案します。提案モデルは、DynamicHierarchical Markov Random Fields (DHMRF)と呼ばれます。DHMRFは構造的不確実性を考慮し、モデル構造とクラスラベルの両方の結合分布を定義します。結合分布は指数族分布です。条件付きモデルとして、DHMRFは有向モデルで行われた独立性の仮定を緩和します。正確な推論は扱いにくいため、モデルのパラメーターを学習し、MAPモデル構造とラベルの割り当てを見つけるために変分法が開発されています。DHMRFを実際のWebデータ抽出タスクに適用します。実験結果から、(1)統合Webデータ抽出モデルは、分離モデルと比較して、レコード検出と属性のラベル付けの両方で大幅な改善を達成できることがわかります。（２）多様なウェブデータ抽出において、DHMRFは固定構造の階層モデルで発生するブロック状のアーティファクトの問題を解決できる可能性があります。

Aggregation of SVM Classifiers Using Sobolev Spaces
ソボレフ空間を使用した SVM 分類器の集約

This paper investigates statistical performances of Support VectorMachines (SVM) and considers the problem of adaptation to the marginparameter and to complexity. In particular we provide a classifierwith no tuning parameter. It is a combination of SVM classifiers.Our contribution is two-fold: (1) we propose learning rates for SVMusing Sobolev spaces and build a numerically realizable aggregate thatconverges with same rate; (2) we present practical experiments of thismethod of aggregation for SVM using both Sobolev spaces and Gaussiankernels.

この論文では、Support VectorMachines(SVM)の統計的パフォーマンスを調査し、marginパラメータへの適応と複雑さの問題について考察します。特に、チューニングパラメータのない分類子を提供します。これは、SVM分類子の組み合わせです。私たちの貢献は2つあります:(1)Sobolev空間を使用したSVMの学習率を提案し、同じ速度で収束する数値的に実現可能な集合体を構築します。(2)このSVMの凝集方法について、Sobolev空間とGaussiankernelsの両方を用いた実践的な実験を行う。

Learning to Combine Motor Primitives Via Greedy Additive Regression
貪欲な加法回帰による運動プリミティブの結合の学習

The computational complexities arising in motor control can beameliorated through the use of a library of motor synergies. Wepresent a new model, referred to as the Greedy Additive Regression(GAR) model, for learning a library of torque sequences, and forlearning the coefficients of a linear combination of sequencesminimizing a cost function. From the perspective of numericaloptimization, the GAR model is interesting because it creates alibrary of “local features”—each sequence in the library is asolution to a single training task—and learns to combine thesesequences using a local optimization procedure, namely, additiveregression. We speculate that learners with local representationalprimitives and local optimization procedures will show goodperformance on nonlinear tasks. The GAR model is also interestingfrom the perspective of motor control because it outperforms severalcompeting models. Results using a simulated two-joint arm suggestthat the GAR model consistently shows excellent performance in thesense that it rapidly learns to perform novel, complex motor tasks.Moreover, its library is overcomplete and sparse, meaning that onlya small fraction of the stored torque sequences are used whenlearning a new movement. The library is also robust in the sensethat, after an initial training period, nearly all novel movementscan be learned as additive combinations of sequences in the library,and in the sense that it shows good generalization when an arm’sdynamics are altered between training and test conditions, such aswhen a payload is added to the arm. Lastly, the GAR model works wellregardless of whether motor tasks are specified in joint space orCartesian space. We conclude that learning techniques using localprimitives and optimization procedures are viable and potentiallyimportant methods for motor control and possibly other domains, andthat these techniques deserve further examination by the artificialintelligence and cognitive science communities.

モーター制御で生じる計算の複雑さは、モーター相乗効果のライブラリを使用することで改善できます。トルクシーケンスのライブラリを学習し、コスト関数を最小化するシーケンスの線形結合の係数を学習するための、貪欲加法回帰(GAR)モデルと呼ばれる新しいモデルを紹介します。数値最適化の観点から見ると、GARモデルは「ローカル機能」のライブラリ(ライブラリ内の各シーケンスは単一のトレーニングタスクのソリューション)を作成し、ローカル最適化手順、つまり加法回帰を使用してこれらのシーケンスを組み合わせることを学習するため興味深いものです。ローカル表現プリミティブとローカル最適化手順を持つ学習者は、非線形タスクで優れたパフォーマンスを発揮すると推測されます。GARモデルは、いくつかの競合モデルよりも優れているため、モーター制御の観点からも興味深いものです。シミュレートされた2関節アームを使用した結果から、GARモデルは、新しい複雑な運動タスクを迅速に学習するという点で、一貫して優れたパフォーマンスを示すことが示唆されています。さらに、そのライブラリは過剰に完全でスパースであるため、新しい動作を学習するときには、保存されているトルクシーケンスのごく一部しか使用されません。ライブラリは、最初のトレーニング期間の後、ほぼすべての新しい動作をライブラリ内のシーケンスの付加的な組み合わせとして学習できるという点で堅牢であり、アームのダイナミクスがトレーニング条件とテスト条件の間で変更された場合(アームにペイロードが追加された場合など)、優れた一般化を示すという点で堅牢です。最後に、GARモデルは、運動タスクが関節空間で指定されているか直交空間で指定されているかに関係なく、適切に機能します。ローカルプリミティブと最適化手順を使用した学習手法は、運動制御および場合によっては他の領域で実行可能で潜在的に重要な方法であり、これらの手法は人工知能および認知科学コミュニティによってさらに検討される価値があると結論付けています。

Incremental Identification of Qualitative Models of Biological Systems using Inductive Logic Programming
帰納的論理プログラミングを用いた生体システムの定性的モデルの漸進的同定

The use of computational models isincreasingly expected to play an important role in predicting thebehaviour of biological systems.Models are being sought at different scales of biologicalorganisation namely: sub-cellular,cellular, tissue, organ, organism and ecosystem; with a view ofidentifying how different components are connected together, how theyare controlled and how they behave when functioning as a system. Exceptfor very simple biological processes, system identification fromfirst principles can be extremely difficult. This has brought intofocus automated techniques for constructingmodels using data of system behaviour. Such techniques face threeprincipal issues: (1) The model representation language must be rich enoughto capture system behaviour; (2) The system identification techniquemust be powerful enough to identify substantially complex models; and (3)There may not be sufficient data to obtain both the model’s structureand precise estimates of all of its parameters. In this paper, weaddress these issuesin the following ways: (1) Models are represented in an expressivesubset of first-order logic. Specifically, they are expressed aslogic programs; (2) System identification is done using techniquesdeveloped in Inductive Logic Programming (ILP). This allows theidentification of first-order logic models from data. Specifically, weemploy an incremental approach in which increasingly complex models areconstructed from simpler ones using snapshots of system behaviour; and(3) We restrict ourselves to “qualitative” models. These arenon-parametric: thus, usually less data are required than foridentifying parametric quantitative models. A further advantageis that the data need not be precise numerical observations(instead, they are abstractions like positive, negative, zero,increasing, decreasing and so on).We describe incremental constructionof qualitative modelsusing a simple physical system and demonstrate its applicationto identification of models at four scales of biologicalorganisation, namely: (a) a predator-prey model at the ecosystem level;(b) a model for the human lung at the organ level; (c) a model for regulationof glucose by insulin in the human body at the extra-cellularlevel; and (d) a model forthe glycolysis metabolic pathway at the cellular level.

計算モデルの使用は、生物システムの挙動を予測する上で重要な役割を果たすことがますます期待されています。モデルは、細胞内、細胞、組織、器官、生物、生態系など、生物組織のさまざまなスケールで求められています。これは、さまざまなコンポーネントがどのように相互に関連し、どのように制御され、システムとして機能するときにどのように動作するかを特定することを目的としています。非常に単純な生物学的プロセスを除いて、第一原理からのシステム識別は極めて困難です。このため、システム動作のデータを使用してモデルを構築する自動化技術に注目が集まっています。このような技術には、次の3つの主な問題があります。(1)モデル表現言語は、システム動作を捉えるのに十分な機能を備えている必要があります。(2)システム識別技術は、かなり複雑なモデルを識別できるほど強力である必要があります。(3)モデルの構造と、そのすべてのパラメーターの正確な推定値の両方を取得するのに十分なデータがない場合があります。この論文では、これらの問題を次の方法で取り上げます。(1)モデルは、一階述語論理の表現力豊かなサブセットで表現されます。具体的には、論理プログラムとして表現されます。(2)システム識別は、帰納的論理プログラミング(ILP)で開発された手法を使用して行われます。これにより、データから1次論理モデルを識別できます。具体的には、システムの動作のスナップショットを使用して、より単純なモデルから徐々に複雑なモデルを構築する増分アプローチを採用しています。(3)「定性的」モデルに限定しています。これらは非パラメトリックであるため、通常、パラメトリック定量的モデルを識別する場合よりも必要なデータが少なくなります。さらに、データは正確な数値観測である必要がないという利点があります(代わりに、正、負、ゼロ、増加、減少などの抽象化です)。単純な物理システムを使用して定性的モデルの増分構築について説明し、生物学的組織の4つのスケールでのモデルの識別への適用を示します。つまり、(a)生態系レベルの捕食者と被食者モデル、(b)臓器レベルの人間の肺のモデル、(c)細胞外レベルの人間の体内でのインスリンによるグルコースの調節モデルです。（d）細胞レベルでの解糖代謝経路のモデル。

Causal Reasoning with Ancestral Graphs
祖先グラフによる因果推論

Causal reasoning is primarily concerned with what wouldhappen to a system under external interventions. In particular, weare often interested in predicting the probability distribution ofsome random variables that would result if some other variableswere forced to take certain values. One prominent approachto tackling this problem is based on causal Bayesian networks,using directed acyclic graphs as causal diagrams to relatepost-intervention probabilities to pre-intervention probabilitiesthat are estimable from observational data. However, such causaldiagrams are seldom fully testable given observational data. Inconsequence, many causal discovery algorithms based on data-miningcan only output an equivalence class of causal diagrams (ratherthan a single one). This paper is concerned with causal reasoninggiven an equivalence class of causal diagrams, represented by a(partial) ancestral graph. We present two main results. Thefirst result extends Pearl (1995)’s celebrated do-calculusto the context of ancestral graphs. In the second result, we focuson a key component of Pearl’s calculus—the property of invariance under interventions, and give stronger graphicalconditions for this property than those implied by the firstresult. The second result also improves the earlier, similarresults due to Spirtes et al. (1993).

因果推論は、主に外部介入を受けた場合にシステムに何が起こるかに関係しています。特に、他の変数が特定の値を取るように強制された場合に生じるランダム変数の確率分布を予測することに関心があることがよくあります。この問題に取り組むための1つの主要なアプローチは、因果ベイジアンネットワークに基づくもので、有向非巡回グラフを因果図として使用して、観察データから推定可能な介入後の確率を介入前の確率に関連付けます。ただし、このような因果図は、観察データが与えられても完全にテストできることはほとんどありません。その結果、データマイニングに基づく多くの因果発見アルゴリズムは、因果図の同値クラス(1つではなく)しか出力できません。この論文では、(部分的な)祖先グラフで表される因果図の同値クラスが与えられた場合の因果推論に関するものです。2つの主な結果を示します。最初の結果は、パール(1995)の有名なdo計算を祖先グラフのコンテキストに拡張したものです。2番目の結果では、パールの計算の重要な要素である介入に対する不変性の特性に焦点を当て、この特性に対して最初の結果で示唆されたものよりも強力なグラフ条件を示します。2番目の結果は、Spirtesら(1993)による以前の同様の結果も改善しています。

Online Learning of Complex Prediction Problems Using Simultaneous Projections
同時射影を用いた複雑な予測問題のオンライン学習

We describe and analyze an algorithmic framework for online classificationwhere each online trial consists of multiple prediction tasks that aretied together. We tackle the problem of updating the online predictor bydefining a projection problem in which each prediction task corresponds to asingle linear constraint. These constraints are tied together through a singleslack parameter. We then introduce a general method for approximately solvingthe problem by projecting simultaneously and independently on eachconstraint which corresponds to a prediction sub-problem, and then averagingthe individual solutions. We show that this approach constitutes a feasible,albeit not necessarily optimal, solution of the original projection problem.We derive concrete simultaneous projection schemes and analyze them in the mistakebound model. We demonstrate the power of the proposed algorithm in experimentswith synthetic data and with multiclass text categorization tasks.

私たちは、各オンライン試行が互いに結びついた複数の予測タスクで構成されるオンライン分類のためのアルゴリズムフレームワークを説明し、分析します。オンライン予測子の更新の問題に取り組み、各予測タスクが単一の線形制約に対応する射影問題を定義します。これらの制約は、1つのslackパラメーターによって結び付けられます。次に、予測サブ問題に対応する各制約に同時かつ独立して投影し、個々の解を平均化することにより、問題を近似的に解くための一般的な方法を紹介します。このアプローチが、必ずしも最適ではないにしても、元の投影問題の実行可能な解決策を構成することを示します。具体的な同時射影スキームを導出し、ミスバウンドモデルで解析します。提案されたアルゴリズムの力を、合成データと多クラステキスト分類タスクを使用した実験で実証します。

Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines
大規模L2損失線形サポートベクトルマシンの座標降下法

Linear support vector machines (SVM) are useful for classifyinglarge-scale sparse data. Problems with sparse features are commonin applications such as document classification and natural languageprocessing. In this paper, we propose a novel coordinate descentalgorithm for training linear SVM with the L2-loss function. At each step, the proposed method minimizes a one-variable sub-problemwhile fixing other variables. The sub-problem is solved by Newtonsteps with the line search technique. The procedure globallyconverges at the linear rate. As each sub-problem involves only values of a corresponding feature, the proposed approach is suitable when accessing a feature is more convenient than accessing an instance.Experiments show that our method is moreefficient and stable than state of the art methods such as Pegasosand TRON.

線形サポートベクターマシン(SVM)は、大規模なスパースデータの分類に役立ちます。スパース特徴の問題は、ドキュメント分類や自然言語処理などのアプリケーションで一般的です。この論文では、L2損失関数を使用して線形SVMを訓練するための新しい座標降下アルゴリズムを提案します。各ステップで、提案された方法は、他の変数を修正しながら、1つの変数のサブ問題を最小化します。サブ問題は、ライン探索手法を使用してNewtonstepsによって解決されます。プロシージャーは、線形レートでグローバルに収束します。各サブ問題には対応する特徴の値のみが含まれるため、提案されたアプローチは、特徴へのアクセスがインスタンスへのアクセスよりも便利な場合に適しています。実験によると、当社の方法は、Pegasosand TRONなどの最先端の方法よりも効率的で安定しています。

A Bahadur Representation of the Linear Support Vector Machine
線形サポートベクトルマシンのバハドゥール表現

The support vector machine has been successful in a variety ofapplications. Also on the theoretical front, statistical propertiesof the support vector machine have been studied quite extensivelywith a particular attention to its Bayes risk consistency under someconditions. In this paper, we study somewhat basic statisticalproperties of the support vector machine yet to be investigated,namely the asymptotic behavior of the coefficients of the linearsupport vector machine. A Bahadur type representation of thecoefficients is established under appropriate conditions, and theirasymptotic normality and statistical variability are derived on thebasis of the representation. These asymptotic results do not onlyhelp further our understanding of the support vector machine, butalso they can be useful for related statistical inferences.

サポートベクターマシンは、さまざまなアプリケーションで成功を収めています。また、理論的な面では、サポートベクターマシンの統計的特性は、いくつかの条件下でのベイズリスクの一貫性に特に注意を払って、かなり広範囲に研究されています。この論文では、まだ調査されていないサポートベクターマシンのやや基本的な統計特性、つまり線形サポートベクターマシンの係数の漸近的な振る舞いを研究します。係数のバハドゥール型表現は、適切な条件下で確立され、その表現に基づいて漸近正規性と統計的変動性が導出されます。これらの漸近的な結果は、サポートベクターマシンの理解を深めるのに役立つだけでなく、関連する統計的推論にも役立ちます。

Using Markov Blankets for Causal Structure Learning
因果構造学習のためのマルコフブランケットの使用

We show how a generic feature-selection algorithm returning stronglyrelevant variables can be turned into a causal structure-learningalgorithm. We prove this under the Faithfulness assumption for thedata distribution. In a causal graph, the strongly relevant variablesfor a node X are its parents, children, and children’s parents (orspouses), also known as the Markov blanket of X. Identifying thespouses leads to the detection of the V-structure patterns and thus tocausal orientations. Repeating the task for all variables yields avalid partially oriented causal graph. We first show an efficient wayto identify the spouse links. We then perform several experiments inthe continuous domain using the Recursive Feature Eliminationfeature-selection algorithm with Support Vector Regression andempirically verify the intuition of this direct (but computationallyexpensive) approach. Within the same framework, we then devise a fastand consistent algorithm, Total Conditioning (TC), and a variant,TCbw, with an explicit backward feature-selection heuristics, forGaussian data. After running a series of comparative experiments onfive artificial networks, we argue that Markov blanket algorithms suchas TC/TCbw or Grow-Shrink scale better than the reference PCalgorithm and provides higher structural accuracy.

私たちは、強く関連のある変数を返す一般的な特徴選択アルゴリズムを、因果構造学習アルゴリズムに変換する方法を示します。これは、データ分布の忠実性仮定の下で証明します。因果グラフでは、ノードXに強く関連のある変数は、その親、子、子の親(または配偶者)であり、Xのマルコフブランケットとも呼ばれます。配偶者を識別すると、V構造パターンが検出され、因果方向がわかります。すべての変数に対してこのタスクを繰り返すと、有効な部分的に方向付けられた因果グラフが生成されます。最初に、配偶者のリンクを識別する効率的な方法を示します。次に、サポートベクター回帰による再帰的特徴除去特徴選択アルゴリズムを使用して連続領域でいくつかの実験を実行し、この直接的な(ただし計算コストが高い)アプローチの直感を経験的に検証します。同じフレームワーク内で、ガウスデータ用の高速で一貫性のあるアルゴリズムであるTotal Conditioning (TC)と、明示的な後方特徴選択ヒューリスティックを備えたバリアントであるTCbwを考案しました。5つの人工ネットワークで一連の比較実験を実行した後、TC/TCbwやGrow-Shrinkなどのマルコフブランケットアルゴリズムは、参照PCalgorithmよりもスケールが良く、構造精度が高いと主張しています。

Optimal Solutions for Sparse Principal Component Analysis
スパース主成分分析の最適解

Given a sample covariance matrix, we examine the problem of maximizingthe variance explained by a linear combination of the input variableswhile constraining the number of nonzero coefficients in thiscombination. This is known as sparse principal component analysis andhas a wide array of applications in machine learning andengineering. We formulate a new semidefinite relaxation to thisproblem and derive a greedy algorithm that computes a full setof good solutions for all target numbers of non zero coefficients,with total complexity O(n3), where nis the number of variables. We then use the same relaxation to derivesufficient conditions for global optimality of a solution, which canbe tested in O(n3), per pattern. We discussapplications in subset selection and sparse recovery and show onartificial examples and biological data that our algorithm doesprovide globally optimal solutions in many cases.

サンプルの共分散行列が与えられた場合、入力変数の線形結合によって説明される分散を最大化する問題を検討し、この組み合わせでゼロ以外の係数の数を制約します。これはスパース主成分分析として知られており、機械学習やエンジニアリングに幅広く応用されています。この問題に対する新しい半定値緩和を定式化し、ゼロでない係数のすべてのターゲット数に対して、総複雑性O(n3) (nisは変数の数)で良好な解のフルセットを計算する貪欲なアルゴリズムを導き出します。次に、同じ緩和を使用して、パターンごとにO(n3)でテストできる溶液の大域的最適性のための十分な条件を導き出します。サブセット選択とスパース回復のアプリケーションについて説明し、私たちのアルゴリズムが多くの場合、グローバルに最適なソリューションを提供する人工的な例と生物学的データを示します。

Maximal Causes for Non-linear Component Extraction
非線形成分抽出の最大原因

We study a generative model in which hidden causes combinecompetitively to produce observations. Multiple active causes combineto determine the value of an observed variable through a maxfunction, in the place where algorithms such as sparse coding,independent component analysis, or non-negative matrix factorizationwould use a sum. This max rule can represent a more realisticmodel of non-linear interaction between basic components in manysettings, including acoustic and image data.While exact maximum-likelihood learning of the parameters of thismodel proves to be intractable, we show that efficientapproximations to expectation-maximization (EM) can be found in thecase of sparsely active hidden causes. One of these approximationscan be formulated as a neural network model with a generalized softmaxactivation function and Hebbian learning.Thus, we show that learning in recent softmax-like neural networks maybe interpreted as approximate maximization of a data likelihood.We use the bars benchmark test to numerically verify our analyticalresults and to demonstrate the competitiveness of the resultingalgorithms.Finally, we show results of learning model parameters to fit acousticand visual data sets in which max-like component combinations arisenaturally.

私たちは、隠れた原因が競合的に組み合わさって観測値を生成する生成モデルを研究します。スパースコーディング、独立成分分析、非負行列分解などのアルゴリズムが合計を使用する場所で、複数のアクティブな原因が組み合わさって、最大関数を介して観測変数の値を決定します。この最大ルールは、音響データや画像データを含む多くの設定で、基本コンポーネント間の非線形相互作用のより現実的なモデルを表すことができます。このモデルのパラメータの正確な最大尤度学習は困難であることが判明していますが、スパースにアクティブな隠れた原因の場合、期待値最大化(EM)の効率的な近似値を見つけることができることを示しています。これらの近似の1つは、一般化されたソフトマックス活性化関数とヘブビアン学習を備えたニューラルネットワークモデルとして定式化できます。したがって、最近のソフトマックスのようなニューラルネットワークでの学習は、データ尤度の近似最大化として解釈できることを示します。バーベンチマークテストを使用して、分析結果を数値的に検証し、結果として得られるアルゴリズムの競争力を実証します。最後に、maxのようなコンポーネントの組み合わせが自然に発生する音響および視覚データセットに適合するようにモデルパラメータを学習した結果を示します。

Consistency of the Group Lasso and Multiple Kernel Learning
グループラッソと複数カーネル学習の一貫性

We consider the least-square regression problem with regularization bya block l1-norm, that is, a sum of Euclidean norms over spacesof dimensions larger than one. This problem, referred to as the groupLasso, extends the usual regularization by the l1-norm where allspaces have dimension one, where it is commonly referred to as theLasso. In this paper, we study the asymptotic group selectionconsistency of the group Lasso. We derive necessary and sufficientconditions for the consistency of group Lasso under practicalassumptions, such as model mis specification. When the linearpredictors and Euclidean norms are replaced by functions andreproducing kernel Hilbert norms, the problem is usually referred toas multiple kernel learning and is commonly used for learning fromheterogeneous data sources and for non linear variableselection. Using tools from functional analysis, and in particularcovar iance operators, we extend the consistency results to thisinfinite dimensional case and also propose an adaptive scheme toobtain a consistent model estimate, even when the necessary conditionrequired for the non adaptive scheme is not satisfied.

私たちは、ブロックl1ノルム、つまり1より大きい次元の空間上のユークリッドノルムの合計による正則化を伴う最小二乗回帰問題を検討します。グループLassoと呼ばれるこの問題は、すべての空間が次元1を持つl1ノルムによる通常の正則化を拡張したもので、一般的にLassoと呼ばれています。この論文では、グループLassoの漸近的グループ選択一貫性について検討します。モデルが誤って指定されているなどの実際的な仮定の下で、グループLassoの一貫性に必要な条件と十分な条件を導出します。線形予測子とユークリッドノルムが関数と再生カーネルヒルベルトノルムに置き換えられた場合、この問題は通常、多重カーネル学習と呼ばれ、異種データソースからの学習や非線形変数選択によく使用されます。関数解析のツール、特に共分散演算子を使用して、この無限次元のケースに一貫性の結果を拡張し、非適応型スキームに必要な条件が満たされない場合でも、一貫性のあるモデル推定値を取得するための適応型スキームを提案します。

Cross-Validation Optimization for Large Scale Structured Classification Kernel Methods
大規模構造化分類カーネル法のための交差検証の最適化

We propose a highly efficient framework for penalized likelihood kernelmethods applied to multi-class models with a large, structured set of classes.As opposed to many previous approaches which try to decompose the fittingproblem into many smaller ones, we focus on a Newton optimization of thecomplete model, making use of model structure and linear conjugate gradientsin order to approximate Newton search directions. Crucially, our learningmethod is based entirely on matrix-vector multiplication primitives with thekernel matrices and their derivatives, allowing straightforward specializationto new kernels, and focusing code optimization efforts to these primitivesonly.Kernel parameters are learned automatically, by maximizing the cross-validationlog likelihood in a gradient-based way, and predictive probabilities areestimated. We demonstrate our approach on large scale text classificationtasks with hierarchical structure on thousands of classes, achievingstate-of-the-art results in an order of magnitude less time than previouswork.Parts of this work appeared in the conference paper Seeger (2007).

私たちは、大規模で構造化されたクラスのセットを持つマルチクラスモデルに適用されるペナルティ付き尤度カーネル法の非常に効率的なフレームワークを提案します。フィッティング問題を多数の小さな問題に分解しようとする多くの従来のアプローチとは対照的に、我々はニュートン探索方向を近似するためにモデル構造と線形共役勾配を利用して、完全なモデルのニュートン最適化に焦点を当てます。重要なことは、我々の学習方法がカーネル行列とその導関数を含む行列ベクトル乗算プリミティブに完全に基づいている点です。これにより、新しいカーネルへの直接的な特殊化が可能になり、コード最適化の取り組みはこれらのプリミティブのみに集中します。カーネルパラメーターは、勾配ベースの方法で交差検証ログ尤度を最大化することによって自動的に学習され、予測確率が推定されます。私たちは、数千のクラスに階層構造を持つ大規模なテキスト分類タスクで我々のアプローチを実証し、以前の研究よりも桁違いに短い時間で最先端の結果を達成しました。この研究の一部は、会議論文Seeger (2007)に掲載されました。

A Multiple Instance Learning Strategy for Combating Good Word Attacks on Spam Filters
スパムフィルタに対する良い言葉の攻撃に対抗するための複数インスタンス学習戦略

Statistical spam filters are known to be vulnerable to adversarialattacks. One of the more common adversarial attacks, known asthe good word attack, thwarts spam filters by appending tospam messages sets of “good” words, which are words that are commonin legitimate email but rare in spam. We present a counterattackstrategy that attempts to differentiate spam fromlegitimate email in the input space by transforming each emailinto a bag of multiple segments, and subsequently applyingmultiple instance logistic regression on the bags. Wetreat each segment in the bag as an instance. An emailis classified as spam if at least one instance in the correspondingbag is spam, and as legitimate if all the instancesin it are legitimate. We show that a classifier using ourmultiple instance counterattack strategy is more robustto good word attacks than its single instance counterpartand other single instance learners commonly used in the spam filtering domain.

統計スパムフィルタは、敵対的攻撃に対して脆弱であることが知られています。より一般的な敵対的攻撃の1つであるGood Word攻撃は、スパムメッセージに追加してスパムフィルターを阻止します。これは、正当な電子メールでは一般的だが、スパムではまれな単語である「良い」単語のセットです。各電子メールを複数のセグメントのバッグに変換し、その後、バッグに複数のインスタンスロジスティック回帰を適用することにより、入力空間でスパムと正当な電子メールを区別しようとする対抗攻撃戦略を提示します。バッグ内の各セグメントをインスタンスとして扱います。メールは、対応するバッグ内の少なくとも1つのインスタンスがスパムである場合にスパムとして分類され、その中のすべてのインスタンスが正当である場合に正当として分類されます。複数インスタンスの反撃戦略を使用する分類子は、スパムフィルタリングドメインで一般的に使用される単一インスタンスの対応者や他の単一インスタンスの学習者よりも、良い単語攻撃に対してより堅牢であることを示します。

Ranking Categorical Features Using Generalization Properties
汎化プロパティを使用したカテゴリ特徴のランク付け

Feature ranking is a fundamental machine learning task with variousapplications, including feature selection and decision tree learning.We describe and analyze a new feature ranking method that supportscategorical features with a large number of possible values. We showthat existing ranking criteria rank a feature according to thetraining error of a predictor based on the feature. Thisapproach can fail when ranking categorical features with many values.We propose the Ginger ranking criterion, that estimates thegeneralization error of the predictor associated with the Giniindex. We show that for almost all training sets, the Gingercriterion produces an accurate estimation of the true generalizationerror, regardless of the number of values in a categorical feature. Wealso address the question of finding the optimal predictor that isbased on a single categorical feature. It is shown that the predictorassociated with the misclassification error criterion has the minimalexpected generalization error. We bound the bias of this predictorwith respect to the generalization error of the Bayes optimalpredictor, and analyze its concentration properties. We demonstratethe efficiency of our approach for feature selection and for learningdecision trees in a series of experiments with synthetic and naturaldata sets.

特徴のランク付けは、特徴選択や決定木学習など、さまざまな用途を持つ基本的な機械学習タスクです。ここでは、多数の可能な値を持つカテゴリ特徴をサポートする新しい特徴ランク付け方法について説明し、分析します。既存のランク付け基準は、特徴に基づく予測子のトレーニングエラーに従って特徴をランク付けすることを示します。このアプローチは、多くの値を持つカテゴリ特徴をランク付けするときに失敗する可能性があります。ここでは、Giniインデックスに関連付けられた予測子の一般化エラーを推定するGingerランク付け基準を提案します。ほとんどすべてのトレーニングセットで、Ginger基準は、カテゴリ特徴の値の数に関係なく、真の一般化エラーの正確な推定値を生成することを示します。また、単一のカテゴリ特徴に基づく最適な予測子を見つける問題にも対処します。誤分類エラー基準に関連付けられた予測子は、予想される一般化エラーが最小であることが示されています。ベイズ最適予測子の一般化エラーに関して、この予測子のバイアスを制限し、その集中特性を分析します。私たちは、合成データセットと自然データセットを使用した一連の実験で、特徴選択と決定木の学習に対する私たちのアプローチの効率性を実証します。

Learning Similarity with Operator-valued Large-margin Classifiers
演算子値による大マージン分類器による類似性の学習

A method is introduced to learn and represent similarity with linearoperators in kernel induced Hilbert spaces. Transferring error bounds forvector valued large-margin classifiers to the setting of Hilbert-Schmidtoperators leads to dimension free bounds on a risk functional for linearrepresentations and motivates a regularized objective functional.Minimization of this objective is effected by a simple technique ofstochastic gradient descent. The resulting representations are tested ontransfer problems in image processing, involving plane and spatial geometricinvariants, handwritten characters and face recognition.

カーネル誘導ヒルベルト空間の線形演算子との類似性を学習し、表現するための方法が導入されています。ベクトル値を持つ大マージン分類器の誤差範囲をヒルベルト・シュミット演算子の設定に転送すると、線形表現のリスク汎関数の次元自由境界が得られ、正則化された目的関数が動機付けられます。この目的の最小化は、確率的勾配降下法の単純な手法によってもたらされます。結果として得られる表現は、平面および空間の幾何学的不変量、手書きの文字、顔認識を含む画像処理の転送問題でテストされます。

Consistency of Trace Norm Minimization
トレースノルムの最小化の一貫性

Regularization by the sum of singular values, also referred to as thetrace norm, is a popular technique for estimating low rankrectangular matrices. In this paper, we extend some of the consistencyresults of the Lasso to provide necessary and sufficient conditionsfor rank consistency of trace norm minimization with the square loss.We also provide an adaptive version that is rank consistent even whenthe necessary condition for the non adaptive version is not fulfilled.

特異値の合計による正則化は、トレースノルムとも呼ばれ、低ランクの長方形行列を推定するための一般的な手法です。この論文では、Lassoの一貫性結果の一部を拡張して、トレースノルムの最小化のランク一貫性と二乗損失の必要十分な条件を提供します。また、非適応バージョンの必要な条件が満たされていない場合でもランクが一貫している適応型バージョンも提供しています。

Hit Miss Networks with Applications to Instance Selection
インスタンス選択への応用を伴うヒットミスネットワーク

In supervised learning, a training set consisting of labeled instancesis used by a learning algorithm for generating a model (classifier)that is subsequently employed for deciding the class label of newinstances (for generalization). Characteristics of the training set,such as presence of noisy instances and size, influence the learningalgorithm and affect generalization performance. This paper introducesa new network-based representation of a training set, called hit missnetwork (HMN), which provides a compact description of the nearestneighbor relation over pairs of instances from each pair of classes.We show that structural properties of HMN’s correspond to propertiesof training points related to the one nearest neighbor (1-NN) decisionrule, such as being border or central point. This motivates us to useHMN’s for improving the performance of a 1-NN, classifier by removinginstances from the training set (instance selection). We introducethree new HMN-based algorithms for instance selection. HMN-C, whichremoves instances without affecting accuracy of 1-NN on the originaltraining set, HMN-E, based on a more aggressive storage reduction, andHMN-EI, which applies iteratively HMN-E. Their performance is assessedon 22 data sets with different characteristics, such as inputdimension, cardinality, class balance, number of classes, noisecontent, and presence of redundant variables. Results of experimentson these data sets show that accuracy of 1-NN classifier increasessignificantly when HMN-EI is applied. Comparison withstate-of-the-art editing algorithms for instance selection on thesedata sets indicates best generalization performance of HMN-EI and nosignificant difference in storage requirements. In general, theseresults indicate that HMN’s provide a powerful graph-basedrepresentation of a training set, which can be successfully appliedfor performing noise and redundance reduction in instance-basedlearning.

教師あり学習では、ラベル付けされたインスタンスで構成されるトレーニングセットが、学習アルゴリズムによってモデル(分類器)を生成するために使用されます。このモデルは、その後、新しいインスタンスのクラスラベルを決定するために使用されます(一般化用)。ノイズの多いインスタンスの存在やサイズなどのトレーニングセットの特性は、学習アルゴリズムに影響し、一般化のパフォーマンスに影響します。この論文では、ヒットミスネットワーク(HMN)と呼ばれる、トレーニングセットの新しいネットワークベースの表現を紹介します。これは、各クラスのペアからのインスタンスのペアの最近傍関係を簡潔に記述します。HMNの構造特性は、境界または中心点であるなど、1近傍(1-NN)決定ルールに関連するトレーニングポイントの特性に対応することを示します。これが、トレーニングセットからインスタンスを削除することによって1-NN分類器のパフォーマンスを向上させるためにHMNを使用する動機となります(インスタンス選択)。インスタンス選択用の3つの新しいHMNベースのアルゴリズムを紹介します。HMN-Cは、元のトレーニングセットの1-NNの精度に影響を与えずにインスタンスを削除します。HMN-Eは、より積極的なストレージ削減に基づいています。HMN-EIは、HMN-Eを繰り返し適用します。これらのパフォーマンスは、入力次元、カーディナリティ、クラスバランス、クラス数、ノイズコンテンツ、冗長変数の存在など、異なる特性を持つ22のデータセットで評価されます。これらのデータセットでの実験の結果、HMN-EIを適用すると1-NN分類器の精度が大幅に向上することがわかりました。これらのデータセットでのインスタンス選択の最先端の編集アルゴリズムと比較すると、HMN-EIの一般化パフォーマンスが最高であり、ストレージ要件に大きな違いがないことがわかります。一般に、これらの結果は、HMNがトレーニングセットの強力なグラフベース表現を提供し、インスタンスベース学習でノイズと冗長性の削減を実行するためにうまく適用できることを示しています。

Shark
鮫

SHARK is an object-oriented library for the design of adaptivesystems. It comprises methods for single- and multi-objectiveoptimization (e.g., evolutionary and gradient-based algorithms) aswell as kernel-based methods, neural networks, and other machinelearning techniques.

SHARKは、適応システムの設計のためのオブジェクト指向ライブラリです。これは、単一目的および多目的最適化の方法(進化的および勾配ベースのアルゴリズムなど)のほか、カーネルベースの方法、ニューラルネットワーク、およびその他の機械学習技術で構成されています。

Search for Additive Nonlinear Time Series Causal Models
加法非線形時系列因果モデルの探索

Pointwise consistent, feasible procedures for estimatingcontemporaneous linear causal structure from time series data havebeen developed using multiple conditional independence tests, but nosuch procedures are available for non-linear systems. We describe afeasible procedure for learning a class of non-linear time seriesstructures, which we call additive non-linear time series. We showthat for data generated from stationary models of this type, twoclasses of conditional independence relations among time seriesvariables and their lags can be tested efficiently and consistentlyusing tests based on additive model regression. Combining results ofstatistical tests for these two classes of conditional independencerelations and the temporal structure of time series data, a newconsistent model specification procedure is able to extract relativelydetailed causal information. We investigate the finite sample behaviorof the procedure through simulation, and illustrate the application ofthis method through analysis of the possible causal connections amongfour ocean indices. Several variants of the procedure are alsodiscussed.

時系列データから同時線形因果構造を推定するための、点ごとに一貫性のある実行可能な手順は、複数の条件付き独立性検定を使用して開発されていますが、非線形システムにはそのような手順はありません。ここでは、非線形時系列構造のクラスを学習するための実行可能な手順について説明します。このクラスを、加法非線形時系列と呼びます。このタイプの定常モデルから生成されたデータの場合、時系列変数とそのラグ間の条件付き独立関係の2つのクラスを、加法モデル回帰に基づく検定を使用して効率的かつ一貫して検定できることを示します。これらの2つのクラスの条件付き独立関係と時系列データの時間的構造の統計検定の結果を組み合わせると、新しい一貫性のあるモデル仕様手順によって、比較的詳細な因果情報を抽出できます。シミュレーションによって手順の有限サンプル動作を調査し、4つの海洋指標間の可能な因果関係の分析によってこの方法の適用を示します。手順のいくつかのバリエーションについても説明します。

Accelerated Neural Evolution through Cooperatively Coevolved Synapses
協調共進化シナプスによる神経進化の加速

Many complex control problems require sophisticated solutions that are notamenable to traditional controller design. Not only is it difficultto model real world systems, but often it is unclear what kind ofbehavior is required to solve the task. Reinforcement learning (RL)approaches have made progress by using direct interaction withthe task environment, but have so far not scaled well to large statespaces and environments that are not fully observable. In recentyears, neuroevolution, the artificial evolution of neural networks,has had remarkable success in tasks that exhibit these two properties.In this paper, we compare a neuroevolution methodcalled Cooperative Synapse Neuroevolution (CoSyNE), that usescooperative coevolution at the level of individual synaptic weights,to a broad range of reinforcement learningalgorithms on very difficult versions of the pole balancing problemthat involve large (continuous) state spaces andhidden state. CoSyNE is shown to be significantly more efficient and powerfulthan the other methods on these tasks.

多くの複雑な制御問題には、従来のコントローラ設計では対応できない高度なソリューションが必要です。現実世界のシステムをモデル化するのが困難なだけでなく、タスクを解決するためにどのような動作が必要かが不明瞭な場合がよくあります。強化学習(RL)アプローチは、タスク環境との直接的な相互作用を使用することで進歩してきましたが、これまでのところ、大規模な状態空間や完全に観測可能ではない環境にうまく拡張されていません。近年、ニューロ進化、つまりニューラルネットワークの人工進化は、これら2つの特性を示すタスクで目覚ましい成功を収めています。この論文では、個々のシナプス重みのレベルで協調的共進化を使用する協調シナプスニューロ進化(CoSyNE)と呼ばれるニューロ進化手法を、大規模な(連続)状態空間と隠れ状態を伴う極バランス問題の非常に難しいバージョンでの幅広い強化学習アルゴリズムと比較します。CoSyNEは、これらのタスクで他の方法よりも大幅に効率的で強力であることが示されています。

Bouligand Derivatives and Robustness of Support Vector Machines for Regression
回帰のためのサポートベクトルマシンのブーリガンド導関数とロバスト性

We investigate robustness properties for a broadclass of support vector machines with non-smooth loss functions.These kernel methods are inspired by convex risk minimization ininfinite dimensional Hilbert spaces. Leading examples are thesupport vector machine based on the ε-insensitive loss function,and kernel based quantile regression based on the pinball lossfunction. Firstly, we propose with the Bouligand influence function(BIF) a modification of F.R. Hampel’s influence function. The BIFhas the advantage of being positive homogeneous which is in generalnot true for Hampel’s influence function. Secondly, we show thatmany support vector machines based on a Lipschitz continuous lossfunction and a bounded kernel have a bounded BIF and are thus robustin the sense of robust statistics based on influence functions.

私たちは、非平滑損失関数を持つ幅広いサポートベクトルマシンのロバスト性特性を調査します。これらのカーネル法は、凸リスク最小化、無限次元ヒルベルト空間に触発されています。代表的な例は、ε非感受性損失関数に基づくサポートベクターマシンと、ピンボール損失関数に基づくカーネルベースの分位点回帰です。まず、ブーリガンド影響関数(BIF)を用いて、F.R.ハンペルの影響関数の修正を提案します。BIFには、正で均質であるという利点がありますが、これは一般的にハンペルの影響関数には当てはまりません。次に、リプシッツ連続損失関数と有界カーネルに基づく多くのサポートベクターマシンが有界BIFを持ち、したがって影響関数に基づくロバスト統計という意味でロバストであることを示します。

Graphical Methods for Efficient Likelihood Inference in Gaussian Covariance Models
ガウス共分散モデルにおける効率的な尤度推論のためのグラフィカルな手法

In graphical modelling, a bi-directed graph encodes marginalindependences among random variables that are identified with thevertices of the graph. We show how to transform a bi-directed graphinto a maximal ancestral graph that (i) represents the sameindependence structure as the original bi-directed graph, and (ii)minimizes the number of arrowheads among all ancestral graphssatisfying (i). Here the number of arrowheads of an ancestral graph isthe number of directed edges plus twice the number of bi-directededges. In Gaussian models, this construction can be used for moreefficient iterative maximization of the likelihood function and todetermine when maximum likelihood estimates are equal to empiricalcounterparts.

グラフィカルモデリングでは、双方向グラフは、グラフの頂点で識別される確率変数間の周辺独立性をエンコードします。私たちは、双向グラフを、(i)元の双向グラフと同じ独立構造を表し、(ii)すべての祖先グラフの中で(i)を満たす矢印の数を最小化する最大祖先グラフに変換する方法を示します。ここで、祖先グラフの矢印の数は、有向エッジの数に双指向エッジの数の2倍を加えた数です。ガウスモデルでは、この構造を使用して、尤度関数のより効率的な反復的最大化を行い、最尤推定値が経験的推定値と等しいタイミングを判断できます。

An Error Bound Based on a Worst Likely Assignment
最も可能性の低い割り当てに基づくエラー制限

This paper introduces a new PAC transductive error bound forclassification. The method uses information from the training examplesand inputs of working examples to develop a set of likely assignmentsto outputs of the working examples. A likely assignment with maximumerror determines the bound. The method is very effective for smalldata sets.

この論文では、分類のための新しいPAC変換誤差範囲について紹介します。この手法では、トレーニング例と作業例の入力からの情報を使用して、作業例の出力に対する一連の可能性の高い割り当てを開発します。maximumerrorを持つ可能性のある代入によって、境界が決定されます。この方法は、小さなデータセットに非常に効果的です。

Finite-Time Bounds for Fitted Value Iteration
適合値の反復の有限時間境界

In this paper we develop a theoretical analysis of the performance ofsampling-based fitted value iteration (FVI) to solve infinitestate-space, discounted-reward Markovian decision processes (MDPs)under the assumption that a generative model of the environment isavailable. Our main results come in the form of finite-time bounds onthe performance of two versions of sampling-based FVI. Theconvergence rate results obtained allow us to show that both versionsof FVI are well behaving in the sense that by using a sufficientlylarge number of samples for a large class of MDPs, arbitrary goodperformance can be achieved with high probability. An importantfeature of our proof technique is that it permits the study ofweighted Lp-norm performance bounds. As a result, our techniqueapplies to a large class of function-approximation methods (e.g.,neural networks, adaptive regression trees, kernel machines, locallyweighted learning), and our bounds scale well with the effectivehorizon of the MDP. The bounds show a dependence on the stochasticstability properties of the MDP: they scale with thediscounted-average concentrability of the future-statedistributions. They also depend on a new measure of the approximationpower of the function space, the inherent Bellman residual, whichreflects how well the function space is “aligned” with the dynamicsand rewards of the MDP. The conditions of the main result, as well asthe concepts introduced in the analysis, are extensively discussed andcompared to previous theoretical results. Numerical experiments areused to substantiate the theoretical findings.

この論文では、環境の生成モデルが利用可能であるという仮定の下で、無限状態空間、割引報酬マルコフ決定プロセス(MDP)を解決するためのサンプリングベースの適合値反復(FVI)のパフォーマンスの理論的分析を展開します。主な結果は、サンプリングベースのFVIの2つのバージョンのパフォーマンスに対する有限時間境界の形で得られます。得られた収束率の結果から、大規模なMDPクラスに対して十分に大きなサンプル数を使用することで、任意の良好なパフォーマンスを高い確率で達成できるという意味で、FVIの両方のバージョンが適切に動作していることがわかります。私たちの証明手法の重要な特徴は、重み付けされたLpノルムのパフォーマンス境界の研究を可能にすることです。その結果、私たちの手法は大規模な関数近似法(ニューラルネットワーク、適応型回帰ツリー、カーネルマシン、局所的に重み付けされた学習など)に適用でき、境界はMDPの有効期間に合わせて適切にスケーリングされます。境界は、MDPの確率的安定性特性に依存します。つまり、将来の状態分布の割引平均集中度に応じて変化します。また、関数空間の近似力の新しい尺度である固有のベルマン残差にも依存します。これは、関数空間がMDPのダイナミクスと報酬にどの程度「適合」しているかを反映します。主な結果の条件と、分析で導入された概念は、広範囲にわたって議論され、以前の理論的結果と比較されます。数値実験は、理論的発見を実証するために使用されます。

Bayesian Inference and Optimal Design for the Sparse Linear Model
スパース線形モデルのためのベイズ推論と最適設計

The linear model with sparsity-favouring prior on the coefficients hasimportant applications in many different domains. In machine learning,most methods to date search for maximum a posteriori sparse solutions andneglect to represent posterior uncertainties. In this paper, we addressproblems of Bayesian optimal design (or experiment planning), for whichaccurate estimates of uncertainty are essential. To this end, we employexpectation propagation approximate inference for the linear model withLaplace prior, giving new insight into numerical stability properties andproposing a robust algorithm. We also show how to estimate modelhyperparameters by empirical Bayesian maximisation of the marginal likelihood,and propose ideas in order to scale up the method to very largeunderdetermined problems.We demonstrate the versatility of our framework on the application ofgene regulatory network identification from micro-array expression data,where both the Laplace prior and the active experimental design approach areshown to result in significant improvements. We also address the problem ofsparse coding of natural images, and show how our framework can be usedfor compressive sensing tasks.Part of this work appeared in Seeger et al. (2007b). The gene networkidentification application appears in Steinke et al. (2007).

係数にスパース性優先の事前分布を持つ線形モデルは、さまざまな分野で重要な用途があります。機械学習では、これまでのほとんどの方法は、事後的に最大のスパース解を求め、事後不確実性を表現することを怠っています。この論文では、不確実性の正確な推定が不可欠なベイズ最適設計(または実験計画)の問題を取り上げます。この目的のために、ラプラス事前分布を持つ線形モデルに期待伝播近似推論を採用し、数値安定性特性に関する新しい洞察を提供し、堅牢なアルゴリズムを提案します。また、ベイズ経験的周辺尤度最大化によってモデルのハイパーパラメータを推定する方法を示し、この方法を非常に大規模な劣決定問題に拡張するためのアイデアを提案します。マイクロアレイ発現データからの遺伝子調節ネットワーク識別への適用におけるフレームワークの汎用性を示し、ラプラス事前分布と能動実験設計アプローチの両方が大幅な改善をもたらすことを示します。また、自然画像のスパースコーディングの問題にも取り組み、このフレームワークを圧縮センシングタスクに使用する方法を示します。この研究の一部は、Seegerら(2007b)に掲載されています。遺伝子ネットワーク識別アプリケーションは、Steinkeら(2007)に掲載されています。

Multi-class Discriminant Kernel Learning via Convex Programming
凸型プログラミングによる多クラス判別カーネル学習

Regularized kernel discriminant analysis (RKDA) performs lineardiscriminant analysis in the feature space via the kernel trick. Itsperformance depends on the selection of kernels. In this paper, weconsider the problem of multiple kernel learning (MKL) for RKDA, inwhich the optimal kernel matrix is obtained as a linear combinationof pre-specified kernel matrices. We show that the kernel learningproblem in RKDA can be formulated as convex programs. First, we showthat this problem can be formulated as a semidefinite program (SDP).Based on the equivalence relationship between RKDA and least squareproblems in the binary-class case, we propose a convex quadraticallyconstrained quadratic programming (QCQP) formulation for kernellearning in RKDA. A semi-infinite linear programming (SILP)formulation is derived to further improve the efficiency. We extendthese formulations to the multi-class case based on a key resultestablished in this paper. That is, the multi-class RKDA kernellearning problem can be decomposed into a set of binary-class kernellearning problems which are constrained to share a common kernel.Based on this decomposition property, SDP formulations are proposedfor the multi-class case. Furthermore, it leads naturally to QCQPand SILP formulations. As the performance of RKDA depends on theregularization parameter, we show that this parameter can also beoptimized in a joint framework with the kernel. Extensiveexperiments have been conducted and analyzed, and connections toother algorithms are discussed.

正規化カーネル判別分析(RKDA)は、カーネルトリックを介して特徴空間で線形判別分析を実行します。そのパフォーマンスはカーネルの選択に依存します。この論文では、最適なカーネル行列が事前に指定されたカーネル行列の線形結合として取得される、RKDAの多重カーネル学習(MKL)の問題を検討します。RKDAのカーネル学習問題は、凸計画として定式化できることを示します。まず、この問題は半正定値計画(SDP)として定式化できることを示します。バイナリクラスの場合のRKDAと最小二乗問題間の同値関係に基づいて、RKDAのカーネル学習用の凸二次制約二次計画(QCQP)定式化を提案します。半無限線形計画(SILP)定式化は、効率をさらに向上させるために導出されます。この論文で確立された重要な結果に基づいて、これらの定式化をマルチクラスの場合に拡張します。つまり、マルチクラスRKDAカーネル学習問題は、共通のカーネルを共有するように制約されたバイナリクラスカーネル学習問題のセットに分解できます。この分解特性に基づいて、マルチクラスの場合のSDP定式化が提案されています。さらに、これは自然にQCQPおよびSILP定式化につながります。RKDAのパフォーマンスは正規化パラメータに依存するため、このパラメータもカーネルとの共同フレームワークで最適化できることを示します。広範な実験が行われ、分析され、他のアルゴリズムとの関連が議論されています。

Learning Control Knowledge for Forward Search Planning
前方捜索計画のための学習制御知識

A number of today’s state-of-the-art planners are based on forwardstate-space search. The impressive performance can be attributed toprogress in computing domain independent heuristics that perform wellacross many domains. However, it is easy to find domains where suchheuristics provide poor guidance, leading to planning failure.Motivated by such failures, the focus of this paper is to investigatemechanisms for learning domain-specific knowledge to better controlforward search in a given domain. While there has been a large bodyof work on inductive learning of control knowledge for AI planning,there is a void of work aimed at forward-state-space search. Onereason for this may be that it is challenging to specify a knowledgerepresentation for compactly representing important concepts across awide range of domains. One of the main contributions of this work isto introduce a novel feature space for representing such controlknowledge. The key idea is to define features in terms of informationcomputed via relaxed plan extraction, which has been a major source ofsuccess for non-learning planners. This gives a new way of leveragingrelaxed planning techniques in the context of learning. Using thisfeature space, we describe three forms of control knowledge—reactivepolicies (decision list rules and measures of progress) and linearheuristics—and show how to learn them and incorporate them intoforward state-space search. Our empirical results show that ourapproaches are able to surpass state-of-the-art non-learning plannersacross a wide range of planning competition domains.

今日の最先端のプランナーの多くは、順方向状態空間探索に基づいています。この優れたパフォーマンスは、多くのドメインで優れたパフォーマンスを発揮するドメインに依存しないヒューリスティックの計算の進歩によるものです。しかし、そのようなヒューリスティックが不十分なガイダンスを提供し、プランニングの失敗につながるドメインを見つけることは簡単です。このような失敗に動機づけられて、この論文では、特定のドメインで順方向探索をより適切に制御するためにドメイン固有の知識を学習するメカニズムを調査することに焦点を当てています。AIプランニングの制御知識の帰納的学習に関する研究は多数行われてきましたが、順方向状態空間探索を目的とした研究はありません。その理由の1つは、幅広いドメインにわたって重要な概念をコンパクトに表現するための知識表現を指定することが難しいためと考えられます。この研究の主な貢献の1つは、このような制御知識を表現するための新しい特徴空間を導入することです。重要なアイデアは、緩和計画抽出によって計算された情報の観点から特徴を定義することです。これは、非学習プランナーの成功の大きな源となっています。これにより、学習のコンテキストで緩和計画手法を活用する新しい方法が提供されます。この特徴空間を使用して、リアクティブポリシー(決定リストルールと進捗の測定)と線形ヒューリスティックという3つの形式の制御知識を説明し、それらを学習して前向き状態空間検索に組み込む方法を示します。実験結果から、このアプローチは、さまざまな計画競合ドメインで最先端の非学習プランナーを上回ることができることがわかります。

Graphical Models for Structured Classification, with an Application to Interpreting Images of Protein Subcellular Location Patterns
構造化分類のためのグラフィカルモデルと、タンパク質細胞内位置パターンの画像解釈への応用

In structured classification problems, there is a direct conflictbetween expressive models and efficient inference: while graphicalmodels such as Markov random fields or factor graphs can representarbitrary dependences among instance labels, the cost of inference viabelief propagation in these models grows rapidly as the graphstructure becomes more complicated. One important source ofcomplexity in belief propagation is the need to marginalize largefactors to compute messages. This operation takes time exponential inthe number of variables in the factor, and can limit theexpressiveness of the models we can use. In this paper, we study anew class of potential functions, which we call decomposablek-way potentials, and provide efficient algorithms forcomputing messages from these potentials during belief propagation.We believe these new potentials provide a good balance betweenexpressive power and efficient inference in practical structuredclassification problems. We discuss three instances of decomposablepotentials: the associative Markov network potential, the nestedjunction tree, and a new type of potential which we call the votingpotential. We use these potentials to classify images of proteinsubcellular location patterns in groups of cells. Classifyingsubcellular location patterns can help us answer many importantquestions in computational biology, including questions about howvarious treatments affect the synthesis and behavior of proteins andnetworks of proteins within a cell. Our new representation andalgorithm lead to substantial improvements in both inference speed andclassification accuracy.

構造化分類問題では、表現力豊かなモデルと効率的な推論の間には直接的な矛盾があります。マルコフ確率場や因子グラフなどのグラフィカルモデルはインスタンスラベル間の任意の依存関係を表現できますが、これらのモデルでの信念伝播による推論のコストは、グラフ構造が複雑になるにつれて急速に増大します。信念伝播における複雑さの重要な原因の1つは、メッセージを計算するために大きな因子を周辺化する必要があることです。この操作は、因子の変数の数に応じて指数関数的に時間がかかり、使用できるモデルの表現力を制限する可能性があります。この論文では、分解可能なk方向ポテンシャルと呼ばれる新しいクラスのポテンシャル関数を研究し、信念伝播中にこれらのポテンシャルからメッセージを計算するための効率的なアルゴリズムを提供します。これらの新しいポテンシャルは、実用的な構造化分類問題における表現力と効率的な推論の適切なバランスを提供すると考えています。ここでは、分解可能なポテンシャルの3つの例、すなわち連想マルコフネットワークポテンシャル、ネストされたジャンクションツリー、および投票ポテンシャルと呼ばれる新しいタイプのポテンシャルについて説明します。これらのポテンシャルを使用して、細胞群内のタンパク質細胞内位置パターンの画像を分類します。細胞内位置パターンを分類すると、さまざまな処理がタンパク質の合成と動作、および細胞内のタンパク質ネットワークにどのような影響を与えるかなど、計算生物学における多くの重要な疑問に答えることができます。新しい表現とアルゴリズムにより、推論速度と分類精度の両方が大幅に向上します。

Trust Region Newton Method for Logistic Regression
ロジスティック回帰のための信頼領域ニュートン法

Large-scale logistic regression arises in many applications such asdocument classification and natural language processing. In thispaper, we apply a trust region Newton method to maximize thelog-likelihood of the logistic regression model. The proposed methoduses only approximate Newton steps in the beginning, but achieves fastconvergence in the end. Experiments show that it is faster than thecommonly used quasi Newton approach for logistic regression. We alsoextend the proposed method to large-scale L2-loss linear supportvector machines (SVM).

大規模なロジスティック回帰は、ドキュメント分類や自然言語処理などの多くのアプリケーションで発生します。この論文では、信頼領域ニュートン法を適用して、ロジスティック回帰モデルの対数尤度を最大化します。提案された方法は、最初は近似的なニュートンステップのみを使用しますが、最終的には高速収束を達成します。実験によると、ロジスティック回帰に一般的に使用される準ニュートン法よりも高速です。また、提案手法を大規模なL2損失線形サポートベクターマシン(SVM)に拡張します。

A Library for Locally Weighted Projection Regression
局所的に重み付けされた射影回帰のライブラリ

In this paper we introduce an improved implementation of locally weightedprojection regression (LWPR), a supervised learning algorithm that iscapable of handling high-dimensional input data. As the key features,our code supports multi-threading, is available for multipleplatforms, and provides wrappers for several programming languages.

この論文では、高次元の入力データを処理できる教師あり学習アルゴリズムであるLocally WeightedProjection Regression (LWPR)の改良された実装を紹介します。主な機能として、私たちのコードはマルチスレッドをサポートし、複数のプラットフォームで利用でき、いくつかのプログラミング言語のラッパーを提供します。

Learning Reliable Classifiers From Small or Incomplete Data Sets: The Naive Credal Classifier 2
小さなデータセットまたは不完全なデータセットからの信頼性の高い分類器の学習:ナイーブクレダル分類器2

In this paper, the naive credal classifier, which is aset-valued counterpart of naive Bayes, is extended to a general andflexible treatment of incomplete data, yielding a new classifiercalled naive credal classifier 2 (NCC2). The new classifier deliversclassifications that are reliable even in the presence of small samplesizes and missing values. Extensive empirical evaluations show that,by issuing set-valued classifications, NCC2 is able to isolate andproperly deal with instances that are hard to classify (on which naiveBayes accuracy drops considerably), and to perform as well as naiveBayes on the other instances. The experiments point to a generalproblem: they show that with missing values, empirical evaluations maynot reliably estimate the accuracy of a traditional classifier, suchas naive Bayes. This phenomenon adds even more value to the robustapproach to classification implemented by NCC2.

この論文では、ナイーブベイズのaset値対応物であるナイーブクレダル分類器を、不完全なデータの一般的で柔軟な処理に拡張し、ナイーブクレダル分類器2(NCC2)と呼ばれる新しい分類器を生み出します。新しい分類子は、サンプルサイズが小さく、値が欠損している場合でも信頼性の高い分類を提供します。広範な経験的評価により、NCC2は、セット値の分類を発行することにより、分類が困難なインスタンス(naiveBayesの精度が大幅に低下する)を分離して適切に処理し、他のインスタンスに対してnaiveBayesと同様に実行できることが示されています。実験は一般的な問題を指し示しています:彼らは、欠損値がある場合、経験的評価が単純なベイズなどの従来の分類器の精度を確実に推定できない可能性があることを示しています。この現象は、NCC2によって実装される分類へのロバストなアプローチにさらに価値を追加します。

Closed Sets for Labeled Data
ラベル付きデータの閉集合

Closed sets have been proven successful in the context of compacteddata representation for association rule learning. However, their useis mainly descriptive, dealing only with unlabeled data. This papershows that when considering labeled data, closed sets can be adaptedfor classification and discrimination purposes by convenientlycontrasting covering properties on positive and negative examples. Weformally prove that these sets characterize the space of relevantcombinations of features for discriminating the target class. Inpractice, identifying relevant/irrelevant combinations of featuresthrough closed sets is useful in many applications:to compact emergingpatterns of typical descriptive mining applications, to reduce the numberof essential rules in classification, and to efficiently learn subgroup descriptions, as demonstrated in real-life subgroup discoveryexperiments on a high dimensional microarray data set.

閉集合は、アソシエーションルール学習のための圧縮データ表現のコンテキストで成功することが証明されています。ただし、それらの使用は主に説明的であり、ラベル付けされていないデータのみを扱います。この論文では、ラベル付きデータを検討する場合、正の例と否定的な例で被覆特性を便利に対比することにより、閉集合を分類および識別の目的に適合させることができることを示しています。これらのセットが、ターゲットクラスを区別するための特徴の関連性のある組み合わせの空間を特徴付けることを正式に証明します。実際には、クローズドセットを通じて特徴の関連性/無関係な組み合わせを特定することは、多くのアプリケーションで役立ちます:典型的な記述的マイニングアプリケーションの新興パターンをコンパクトにし、分類に不可欠なルールの数を減らし、高次元マイクロアレイデータセットでの実際のサブグループ発見実験で示されているように、サブグループの説明を効率的に学習します。

An Information Criterion for Variable Selection in Support Vector Machines
サポートベクトルマシンにおける変数選択のための情報量基準

Support vector machines for classification have the advantage thatthe curse of dimensionality is circumvented. It has been shown thata reduction of the dimension of the input space leads to even betterresults. For this purpose, we propose two information criteria whichcan be computed directly from the definition of the support vectormachine. We assess the predictive performance of the models selectedby our new criteria and compare them to existing variable selectiontechniques in a simulation study. The simulation results show thatthe new criteria are competitive in terms of generalization errorrate while being much easier to compute. We arrive at the samefindings for comparison on some real-world benchmark data sets.

分類のためのサポートベクターマシンには、次元の呪いが回避されるという利点があります。入力スペースの次元を縮小すると、さらに良い結果が得られることが示されています。この目的のために、サポートベクターマシンの定義から直接計算できる2つの情報基準を提案します。新しい基準で選択されたモデルの予測性能を評価し、シミュレーション研究の既存の変数選択手法と比較します。シミュレーションの結果は、新しい基準が一般化エラー率の点で競争力があり、計算がはるかに容易であることを示しています。私たちは、いくつかの実際のベンチマークデータセットで比較するために同じ結果に到達します。

Estimating the Confidence Interval for Prediction Errors of Support Vector Machine Classifiers
サポートベクターマシン分類器の予測誤差の信頼区間の推定

Support vector machine (SVM) is one of the most popular andpromising classification algorithms. After a classification rule isconstructed via the SVM, it is essential to evaluate its predictionaccuracy. In this paper, we develop procedures for obtaining bothpoint and interval estimators for the prediction error. Under mildregularity conditions, we derive the consistency and asymptoticnormality of the prediction error estimators for SVM withfinite-dimensional kernels. A perturbation-resampling procedure isproposed to obtain interval estimates for the prediction error inpractice. With numerical studies on simulated data and a benchmarkrepository, we recommend the use of interval estimates centered atthe cross-validated point estimates for the prediction error.Further applications of the proposed procedure in model evaluationand feature selection are illustrated with two examples.

サポートベクターマシン(SVM)は、最も人気があり有望な分類アルゴリズムの1つです。SVMを介して分類ルールを構築した後、その予測精度を評価することが不可欠です。この論文では、予測誤差のポイント推定量と区間推定量の両方を取得するための手順を開発します。穏やかな規則性条件下では、有限次元カーネルを持つSVMの予測誤差推定量の一貫性と漸近正規性を導出します。予測誤差のインプラクティスの区間推定値を取得するために、摂動リサンプリング手順が提案されています。シミュレーションデータとbenchmarkrepositoryに関する数値計算研究では、予測誤差の交差検証された点推定を中心とした間隔推定を使用することをお勧めします。モデル評価と特徴選択における提案された手順のさらなる適用を、2つの例で説明します。

Comments on the Complete Characterization of a Family of Solutions to a Generalized Fisher Criterion
一般化されたフィッシャー基準に対する解のファミリーの完全な特性評価に関するコメント

Loog (2007) provided a complete characterization of thefamily of solutions to a generalized Fisher criterion. We showthat this characterization is essentially equivalent to the originalcharacterization proposed in Ye (2005). The computationaladvantage of the original characterization over the new one isdiscussed, which justifies its practical use.

Loog (2007)は、一般化されたフィッシャー基準に対する解のファミリーの完全な特性付けを提供しました。この特徴付けは、Ye(2005)で提案された元の特徴付けと本質的に同等であることを示します。新しい特性に対する元の特性評価の計算上の利点が議論され、その実用化が正当化されます。

Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data
多変量ガウスデータまたはバイナリデータに対するスパース最尤推定によるモデル選択

We consider the problem of estimating the parameters of a Gaussian orbinary distribution in such a way that the resulting undirectedgraphical model is sparse. Our approach is to solve a maximumlikelihood problem with an added l1-norm penalty term. Theproblem as formulated is convex but the memory requirements andcomplexity of existing interior point methods are prohibitive forproblems with more than tens of nodes. We present two new algorithmsfor solving problems with at least a thousand nodes in the Gaussiancase. Our first algorithm uses block coordinate descent, and can beinterpreted as recursive l1-norm penalized regression. Oursecond algorithm, based on Nesterov’s first order method, yields acomplexity estimate with a better dependence on problem size thanexisting interior point methods. Using a log determinant relaxationof the log partition function (Wainwright and Jordan, 2006), we show that thesesame algorithms can be used to solve an approximate sparse maximumlikelihood problem for the binary case. We test our algorithms onsynthetic data, as well as on gene expression and senate votingrecords data.

私たちは、ガウス分布またはバイナリ分布のパラメータを推定する問題を考察します。この方法では、結果として得られる無向グラフィカルモデルがスパースになります。このアプローチは、l1ノルムペナルティ項を追加して最大尤度問題を解くことです。定式化された問題は凸型ですが、既存の内点法のメモリ要件と複雑さは、数十を超えるノードの問題には適用できません。ガウス分布の場合に少なくとも1,000ノードの問題を解決するための2つの新しいアルゴリズムを紹介します。最初のアルゴリズムはブロック座標降下法を使用し、再帰的なl1ノルムペナルティ付き回帰として解釈できます。ネステロフの1次法に基づく2番目のアルゴリズムは、既存の内点法よりも問題のサイズに対する依存性が高い複雑さの推定値を生成します。対数分割関数の対数行列式緩和法(WainwrightおよびJordan、2006)を使用することで、同じアルゴリズムを使用してバイナリケースの近似スパース最大尤度問題を解決できることを示します。合成データ、遺伝子発現、上院投票記録データでアルゴリズムをテストします。

A Recursive Method for Structural Learning of Directed Acyclic Graphs
有向非巡回グラフの構造学習のための再帰的手法

In this paper, we propose a recursive method for structural learningof directed acyclic graphs (DAGs), in which a problem of structurallearning for a large DAG is first decomposed into two problems ofstructural learning for two small vertex subsets, each of which isthen decomposed recursively into two problems of smaller subsetsuntil none subset can be decomposed further. In our approach, searchfor separators of a pair of variables in a large DAG is localized tosmall subsets, and thus the approach can improve the efficiency ofsearches and the power of statistical tests for structural learning.We show how the recent advances in the learning of undirectedgraphical models can be employed to facilitate the decomposition.Simulations are given to demonstrate the performance of the proposedmethod.

この論文では、有向非巡回グラフ(DAG)の構造学習のための再帰的方法を提案します。これは、大きなDAGの構造学習の問題が最初に2つの小さな頂点サブセットの構造学習の問題に分解され、それぞれが再帰的に小さなサブセットの2つの問題に再分解され、サブセットがさらに分解できなくなるまでです。私たちのアプローチでは、大きなDAG内の変数のペアのセパレータの検索は小さなサブセットにローカライズされるため、このアプローチにより、検索の効率と構造学習のための統計的検定の検出力を向上させることができます。無向グラフィカルモデルの学習における最近の進歩が、分解を容易にするためにどのように使用できるかを示します。提案された方法の性能を実証するためにシミュレーションが行われます。

Theoretical Advantages of Lenient Learners: An Evolutionary Game Theoretic Perspective
寛大な学習者の理論的利点:進化ゲーム理論の視点

This paper presents the dynamics of multiple learning agents from anevolutionary game theoretic perspective. We provide replicatordynamics models for cooperative coevolutionary algorithms and fortraditional multiagent Q-learning, and we extend these differentialequations to account for lenient learners: agents that forgivepossible mismatched teammate actions that resulted in low rewards. Weuse these extended formal models to study the convergence guaranteesfor these algorithms, and also to visualize the basins of attractionto optimal and suboptimal solutions in two benchmark coordinationproblems. The paper demonstrates that lenience provides learners withmore accurate information about the benefits of performing theiractions, resulting in higher likelihood of convergence to the globallyoptimal solution. In addition, the analysis indicates that the choiceof learning algorithm has an insignificant impact on the overallperformance of multiagent learning algorithms; rather, the performanceof these algorithms depends primarily on the level of lenience thatthe agents exhibit to one another. Finally, the research hereinsupports the strength and generality of evolutionary game theory as abackbone for multiagent learning.

この論文では、進化ゲーム理論の観点から、複数の学習エージェントのダイナミクスについて説明します。協力的共進化アルゴリズムと従来のマルチエージェントQ学習用のレプリケータダイナミクスモデルを提供し、これらの微分方程式を拡張して寛大な学習者(低い報酬につながる可能性のあるチームメイトの不一致なアクションを許すエージェント)を考慮します。これらの拡張された形式モデルを使用して、これらのアルゴリズムの収束保証を調査し、2つのベンチマーク調整問題における最適および準最適ソリューションへの吸引領域を視覚化します。この論文では、寛大さによって学習者にアクションを実行する利点に関するより正確な情報が提供され、グローバルに最適ソリューションへの収束の可能性が高くなることを示しています。さらに、分析により、学習アルゴリズムの選択はマルチエージェント学習アルゴリズムの全体的なパフォーマンスにほとんど影響を与えないことが示されています。むしろ、これらのアルゴリズムのパフォーマンスは、エージェントが互いに示す寛大さのレベルに主に依存します。最後に、本研究は、マルチエージェント学習のバックボーンとしての進化ゲーム理論の強さと一般性を裏付けています。

A Tutorial on Conformal Prediction
共形予測のチュートリアル

Conformal prediction uses past experience to determine precise levelsof confidence in new predictions. Given an error probabilityε, together with a method that makes a prediction ŷof a label y, it produces a set of labels, typically containingŷ, that also contains y with probability 1 ε.Conformal prediction can be applied to any method for producingŷ: a nearest-neighbor method, a support-vector machine, ridgeregression, etc. Conformal prediction is designed for an on-line setting in whichlabels are predicted successively, each one being revealed before thenext is predicted. The most novel and valuable feature of conformalprediction is that if the successive examples are sampledindependently from the same distribution, then the successivepredictions will be right 1 ε of the time, even though theyare based on an accumulating data set rather than on independent datasets.In addition to the model under which successive examples are sampledindependently, other on-line compression models can also use conformalprediction. The widely used Gaussian linear model is one of these.This tutorial presents a self-contained account of the theory ofconformal prediction and works through several numerical examples. Amore comprehensive treatment of the topic is provided inAlgorithmic Learning in a Random World, by Vladimir Vovk,Alex Gammerman, and Glenn Shafer (Springer, 2005).

共形予測は、過去の経験を使用して、新しい予測の正確な信頼度を決定します。誤差確率εと、ラベルyの予測ŷを行う方法とを一緒に与えると、通常はŷを含み、確率1εでyも含むラベルのセットが生成されます。共形予測は、最近傍法、サポートベクターマシン、リッジ回帰など、ŷを生成する任意の方法に適用できます。共形予測は、ラベルが連続して予測され、各ラベルが次のラベルを予測する前に明らかにされるオンライン設定用に設計されています。共形予測の最も新しくて価値のある機能は、連続する例が同じ分布から独立してサンプリングされる場合、独立したデータセットではなく蓄積データセットに基づいているにもかかわらず、連続する予測が1εの確率で正確になることです。連続する例が独立してサンプリングされるモデルに加えて、他のオンライン圧縮モデルでも共形予測を使用できます。広く使用されているガウス線形モデルは、これらの1つです。このチュートリアルでは、共形予測の理論について自己完結的な説明を示し、いくつかの数値例を取り上げます。このトピックのより包括的な説明は、Vladimir Vovk、Alex Gammerman、およびGlenn Shaferによる「Algorithmic Learning in a Random World」(Springer、2005年)に記載されています。

Generalization from Observed to Unobserved Features by Clustering
クラスタリングによる観測された特徴から観測されていない特徴への一般化

We argue that when objects are characterized by many attributes, clusteringthem on the basis of a random subset of these attributes cancapture information on the unobserved attributes as well. Moreover,we show that under mild technical conditions, clustering the objectson the basis of such a random subset performs almost as well as clusteringwith the full attribute set. We prove finite sample generalizationtheorems for this novel learning scheme that extends analogous resultsfrom the supervised learning setting. We use our framework to analyzegeneralization to unobserved features of two well-known clusteringalgorithms: k-means and the maximum likelihood multinomial mixturemodel. The scheme is demonstrated for collaborative filtering of userswith movie ratings as attributes and document clustering with wordsas attributes.

私たちは、オブジェクトが多くの属性によって特徴付けられる場合、これらの属性のランダムなサブセットに基づいてそれらをクラスタリングすることで、観察されていない属性に関する情報もキャプチャできると主張します。さらに、穏やかな技術的条件下では、このようなランダムなサブセットに基づいてオブジェクトをクラスタリングすると、完全な属性セットを使用したクラスタリングとほぼ同じパフォーマンスを発揮することを示します。私たちは、教師あり学習設定からの類似の結果を拡張するこの新しい学習スキームの有限サンプル一般化定理を証明します。私たちのフレームワークを使用して、2つのよく知られたクラスタリングアルゴリズム(k-meansと最尤多項混合モデル)の観察されていない特徴への一般化を分析します。このスキームは、映画の評価を属性として持つユーザーの協調フィルタリングと、単語を属性とするドキュメントのクラスタリングで実証されています。

Algorithms for Sparse Linear Classifiers in the Massive Data Setting
大量データ設定におけるスパース線形分類器のアルゴリズム

Classifiers favoring sparse solutions, such as support vectormachines, relevance vector machines, LASSO-regression basedclassifiers, etc., provide competitive methods forclassification problems in high dimensions. However, currentalgorithms for training sparse classifiers typically scale quiteunfavorably with respect to the number of training examples. Thispaper proposes online and multi-pass algorithms for trainingsparse linear classifiers for high dimensional data. Thesealgorithms have computational complexity and memory requirementsthat make learning on massive data sets feasible. The central ideathat makes this possible is a straightforward quadraticapproximation to the likelihood function.

サポートベクターマシン、関連性ベクトルマシン、LASSO回帰ベースの分類子など、スパースソリューションを好む分類器は、高次元での分類問題に対する競争力のある方法を提供します。ただし、スパース分類器をトレーニングするための現在のアルゴリズムは、通常、トレーニング例の数に関して非常に不利にスケーリングされます。この論文では、高次元データ用のスパース線形分類器を訓練するためのオンラインおよびマルチパスアルゴリズムを提案します。これらのアルゴリズムには、計算の複雑さとメモリ要件があるため、大量のデータセットでの学習が可能になります。これを可能にする中心的な考え方は、尤度関数の単純な二次近似です。

Support Vector Machinery for Infinite Ensemble Learning
無限アンサンブル学習のためのサポートベクターマシナリー

Ensemble learning algorithms such as boosting can achieve betterperformance by averaging over the predictions of some base hypotheses.Nevertheless, most existing algorithms are limited to combining only afinite number of hypotheses, and the generated ensemble is usuallysparse. Thus, it is not clear whether we should construct an ensembleclassifier with a larger or even an infinite number of hypotheses. Inaddition, constructing an infinite ensemble itself is a challengingtask. In this paper, we formulate an infinite ensemble learningframework based on the support vector machine (SVM). The frameworkcan output an infinite and nonsparse ensemble through embeddinginfinitely many hypotheses into an SVM kernel. We use the frameworkto derive two novel kernels, the stump kernel and the perceptronkernel. The stump kernel embodies infinitely many decision stumps,and the perceptron kernel embodies infinitely many perceptrons. Wealso show that the Laplacian radial basis function kernel embodiesinfinitely many decision trees, and can thus be explained throughinfinite ensemble learning. Experimental results show that SVM withthese kernels is superior to boosting with the same base hypothesisset. In addition, SVM with the stump kernel or the perceptron kernelperforms similarly to SVM with the Gaussian radial basis functionkernel, but enjoys the benefit of faster parameter selection. Theseproperties make the novel kernels favorable choices in practice.

ブースティングなどのアンサンブル学習アルゴリズムは、いくつかの基本仮説の予測を平均化することで、より良いパフォーマンスを実現できます。ただし、既存のアルゴリズムのほとんどは、有限数の仮説のみを組み合わせることに制限されており、生成されたアンサンブルは通常スパースです。したがって、より多くの仮説または無限数の仮説を使用してアンサンブル分類器を構築する必要があるかどうかは明らかではありません。さらに、無限アンサンブルの構築自体が困難な作業です。この論文では、サポートベクターマシン(SVM)に基づく無限アンサンブル学習フレームワークを定式化します。このフレームワークは、無限の数の仮説をSVMカーネルに埋め込むことで、無限でスパースでないアンサンブルを出力できます。このフレームワークを使用して、スタンプカーネルとパーセプトロンカーネルという2つの新しいカーネルを導出します。スタンプカーネルは無限の数の決定スタンプを具体化し、パーセプトロンカーネルは無限の数のパーセプトロンを具体化します。また、ラプラシアンラジアルベース関数カーネルは無限の決定木を具体化するため、無限アンサンブル学習によって説明できることも示しています。実験結果では、これらのカーネルを使用したSVMは、同じ基本仮説セットを使用したブースティングよりも優れていることが示されています。さらに、スタンプカーネルまたはパーセプトロンカーネルを使用したSVMは、ガウスラジアルベース関数カーネルを使用したSVMと同様に機能しますが、より高速なパラメータ選択の利点があります。これらの特性により、新しいカーネルは実際に好ましい選択肢となります。

Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies
ガウス過程における最適近似センサ配置:理論,効率的アルゴリズムおよび実証的研究

When monitoring spatial phenomena, which can often be modeled asGaussian processes (GPs), choosing sensor locations is a fundamentaltask. There are several common strategies to address this task, forexample, geometry or disk models, placing sensors at the points ofhighest entropy (variance) in the GP model, and A-, D-, or E-optimaldesign. In this paper, we tackle the combinatorial optimizationproblem of maximizing the mutual information between thechosen locations and the locations which are not selected. We provethat the problem of finding the configuration that maximizes mutualinformation is NP-complete. To address this issue, we describe apolynomial-time approximation that is within (1-1/e) of theoptimum by exploiting the submodularity of mutualinformation. We also show how submodularity can be used to obtainonline bounds, and design branch and bound search procedures. Wethen extend our algorithm to exploit lazy evaluations and localstructure in the GP, yielding significant speedups. We also extendour approach to find placements which are robust against nodefailures and uncertainties in the model. These extensions are againassociated with rigorous theoretical approximation guarantees,exploiting the submodularity of the objective function. Wedemonstrate the advantages of our approach towards optimizing mutualinformation in a very extensive empirical study on two real-worlddata sets.

空間現象をモニタリングする場合、多くの場合ガウス過程(GP)としてモデル化できますが、センサーの位置を選択することは基本的なタスクです。このタスクに対処するための一般的な戦略には、ジオメトリまたはディスクモデル、GPモデルでエントロピー(分散)が最も高いポイントにセンサーを配置すること、A最適設計、D最適設計、またはE最適設計などがあります。この論文では、選択された場所と選択されていない場所の間の相互情報量を最大化する組み合わせ最適化問題に取り組みます。相互情報量を最大化する構成を見つける問題はNP完全であることを証明します。この問題に対処するために、相互情報量のサブモジュラ性を利用して、最適値の(1-1/e)以内の多項式時間近似について説明します。また、サブモジュラ性を使用してオンライン境界を取得し、分岐限定検索手順を設計する方法を示します。次に、アルゴリズムを拡張して、GPの遅延評価とローカル構造を利用し、大幅な高速化を実現します。また、モデル内のノード障害や不確実性に対して堅牢な配置を見つけるために、アプローチを拡張します。これらの拡張は、目的関数のサブモジュラリティを活用した厳密な理論的近似保証と再び関連しています。2つの実際のデータセットに関する非常に広範な実証的研究で、相互情報量の最適化に向けたアプローチの利点を実証します。

Optimization Techniques for Semi-Supervised Support Vector Machines
半教師付きサポートベクターマシンの最適化手法

Due to its wide applicability, the problem of semi-supervisedclassification is attracting increasing attention in machine learning.Semi-Supervised Support Vector Machines (S3VMs) are based on applyingthe margin maximization principle to both labeled and unlabeledexamples. Unlike SVMs, their formulation leads to a non-convexoptimization problem. A suite of algorithms have recently beenproposed for solving S3VMs. This paper reviews key ideas in thisliterature. The performance and behavior of various S3VMs algorithmsis studied together, under a common experimental setting.

その適用範囲が広いため、機械学習では半教師あり分類の問題がますます注目を集めています。S3VM (Semi-Supervised Support Vector Machines)は、ラベル付き例とラベルなし例の両方にマージン最大化の原則を適用することに基づいています。SVMとは異なり、その定式化は非凸最適化問題につながります。最近、S3VMを解決するための一連のアルゴリズムが提案されています。この論文では、この文献の主要なアイデアをレビューします。さまざまなS3VMアルゴリズムのパフォーマンスと動作は、共通の実験設定の下で一緒に研究されます。

Evidence Contrary to the Statistical View of Boosting
ブースティングの統計的見解に反する証拠

The statistical perspective on boosting algorithms focuses onoptimization, drawing parallels with maximum likelihood estimationfor logistic regression. In this paper we present empirical evidencethat raises questions about this view. Although the statisticalperspective provides a theoretical framework within which it ispossible to derive theorems and create new algorithms in generalcontexts, we show that there remain many unanswered importantquestions. Furthermore, we provide examples that reveal crucialflaws in the many practical suggestions and new methods that arederived from the statistical view. We perform carefully designedexperiments using simple simulation models to illustrate some ofthese flaws and their practical consequences.

ブースティングアルゴリズムの統計的視点は、最適化に焦点を当て、ロジスティック回帰の最尤推定と平行線を描きます。この論文では、この見解について疑問を投げかける経験的証拠を提示します。統計学的視点は、一般的な文脈で定理を導き出し、新しいアルゴリズムを作成することが可能である理論的枠組みを提供しますが、多くの未解決の重要な問題が残っていることを示しています。さらに、統計的視点から導き出された多くの実用的な提案と新しい方法の重大な欠陥を明らかにする例を提供します。私たちは、これらの欠陥のいくつかとその実際的な結果を説明するために、単純なシミュレーションモデルを使用して慎重に設計された実験を実行します。

Active Learning by Spherical Subdivision
球面細分化によるアクティブラーニング

We introduce a computationally feasible, “constructive” activelearning method for binary classification. The learning algorithm isinitially formulated for separable classification problems, for ahyperspherical data space with constant data density, and for greatspheres as classifiers. In order to reduce computational complexitythe version space is restricted to spherical simplices and learningprocedes by subdividing the edges of maximal length. We show that thisprocedure optimally reduces a tight upper bound on the generalizationerror. The method is then extended to other separable classificationproblems using products of spheres as data spaces and isometriesinduced by charts of the sphere. An upper bound is provided for theprobability of disagreement between classifiers (hence thegeneralization error) for non-constant data densities on the sphere.The emphasis of this work lies on providing mathematically exactperformance estimates for active learning strategies.

私たちは、二項分類のための計算的に実行可能な「構成的」アクティブラーニング方法を紹介します。学習アルゴリズムは、分離可能な分類問題、一定のデータ密度を持つ超球面データ空間、および分類器としての大球体に対して最初に定式化されます。計算の複雑さを軽減するために、バージョンのスペースは、最大長のエッジを細分化することにより、球形の単純化と学習手続きに制限されています。この手順により、汎化誤差の狭い上限が最適に減少することを示します。次に、この方法は、球の積をデータ空間として使用し、球のチャートによって誘発される等角値を使用して、他の分離可能な分類問題に拡張されます。球面上の非定数データ密度の分類器間の不一致の確率(したがって一般化誤差)には上限が与えられます。この研究の重点は、アクティブラーニング戦略の数学的に正確なパフォーマンス推定値を提供することにあります。

Discriminative Learning of Max-Sum Classifiers
最大和分類器の判別学習

The max-sum classifier predicts n-tuple of labels from n-tuple ofobservable variables by maximizing a sum of quality functions defined overneighbouring pairs of labels and observable variables.Predicting labels as MAP assignments of a Random Markov Field is aparticular example of the max-sum classifier.Learning parameters of the max-sum classifier is a challenging problembecause even computing the response of such classifier is NP-completein general. Estimating parameters using the Maximum Likelihoodapproach is feasible only for a subclass of max-sum classifiers withan acyclic structure of neighbouring pairs. Recently, the discriminative methodsrepresented by the perceptron and the Support Vector Machines, originallydesigned for binary linear classifiers, have been extended for learning somesubclasses of the max-sum classifier. Besides the max-sum classifiers with theacyclic neighbouring structure, it has been shown that the discriminativelearning is possible even with arbitrary neighbouring structure provided thequality functions fulfill some additional constraints. In this article, we extend thediscriminative approach to other three classes of max-sum classifiers withan arbitrary neighbourhood structure. We derive learning algorithms for twosubclasses of max-sum classifiers whose response can be computed in polynomialtime: (i) the max-sum classifiers with supermodular quality functions and(ii) the max-sum classifiers whose response can be computed exactly by alinear programming relaxation. Moreover, we show that the learning problemcan be approximately solved even for a general max-sum classifier.

最大和分類器は、隣接するラベルと観測可能な変数のペアに対して定義された品質関数の合計を最大化することにより、n組の観測可能な変数からn組のラベルを予測します。ランダムマルコフ場のMAP割り当てとしてラベルを予測することは、最大和分類器の特定の例です。最大和分類器のパラメータの学習は、そのような分類器の応答を計算することさえ一般にNP完全であるため、困難な問題です。最大尤度アプローチを使用してパラメータを推定することは、隣接するペアの非巡回構造を持つ最大和分類器のサブクラスに対してのみ実行可能です。最近、パーセプトロンとサポートベクターマシンによって表される識別方法は、もともとバイナリ線形分類器用に設計されましたが、最大和分類器のいくつかのサブクラスの学習用に拡張されました。非巡回近傍構造を持つ最大和分類器の他に、品質関数がいくつかの追加制約を満たす限り、任意の近傍構造でも識別学習が可能であることが示されています。この記事では、任意の近傍構造を持つ他の3つのクラスの最大和分類器に識別アプローチを拡張します。応答を多項式時間で計算できる最大和分類器の2つのサブクラスの学習アルゴリズムを導出します。(i)スーパーモジュラー品質関数を持つ最大和分類器、および(ii)応答を線形計画緩和によって正確に計算できる最大和分類器です。さらに、一般的な最大和分類器でも学習問題を近似的に解決できることを示します。

On the Suitable Domain for SVM Training in Image Coding
画像コーディングのSVMトレーニングに適したドメインについて

Conventional SVM-based image coding methods are founded onindependently restricting the distortion in every imagecoefficient at some particular image representation.Geometrically, this implies allowing arbitrary signal distortionsin an n-dimensional rectangle defined by theε-insensitivity zone in each dimension of the selectedimage representation domain. Unfortunately, not every imagerepresentation domain is well-suited for such a simple,scalar-wise, approach because statistical and/or perceptualinteractions between the coefficients may exist. Theseinteractions imply that scalar approaches may induce distortionsthat do not follow the image statistics and/or are perceptuallyannoying. Taking into account these relations would imply usingnon-rectangular ε-insensitivity regions (allowingcoupled distortions in different coefficients), which is beyondthe conventional SVM formulation.In this paper, we report a condition on the suitable domain fordeveloping efficient SVM image coding schemes.We analytically demonstrate that no linear domain fulfills thiscondition because of the statistical and perceptualinter-coefficient relations that exist in these domains.This theoretical result is experimentallyconfirmed by comparing SVM learning in previously reported lineardomains and in a recently proposed non-linear perceptual domainthat simultaneously reduces the statistical and perceptualrelations (so it is closer to fulfilling the proposed condition).These results highlight the relevance of an appropriate choice ofthe image representation before SVM learning.

従来のSVMベースの画像コーディング方法は、特定の画像表現におけるすべての画像係数の歪みを独立して制限することを基本としています。幾何学的には、これは、選択された画像表現ドメインの各次元の ε 不感帯によって定義されるn次元の長方形内で任意の信号歪みを許可することを意味します。残念ながら、係数間の統計的および/または知覚的相互作用が存在する可能性があるため、すべての画像表現ドメインがこのような単純なスカラーアプローチに適しているわけではありません。これらの相互作用は、スカラーアプローチが画像統計に従わない歪みや知覚的に不快な歪みを引き起こす可能性があることを意味します。これらの関係を考慮すると、非長方形の ε 不感領域(異なる係数の結合歪みを許容)を使用する必要があり、これは従来のSVM定式化の範囲を超えています。この論文では、効率的なSVM画像コーディングスキームを開発するための適切なドメインの条件を報告します。これらのドメインには統計的および知覚的な係数間関係が存在するため、この条件を満たす線形ドメインは存在しないことを解析的に示します。この理論的結果は、以前に報告された線形ドメインと、統計的および知覚的関係を同時に削減する(したがって、提案された条件をより満たす)最近提案された非線形知覚ドメインでのSVM学習を比較することによって実験的に確認されています。これらの結果は、SVM学習の前に画像表現を適切に選択することの重要性を強調しています。

Linear-Time Computation of Similarity Measures for Sequential Data
シーケンシャルデータの類似度測度の線形時間計算

Efficient and expressive comparison of sequences is an essentialprocedure for learning with sequential data. In this article wepropose a generic framework for computation of similarity measures forsequences, covering various kernel, distance and non-metric similarityfunctions. The basis for comparison is embedding of sequences using aformal language, such as a set of natural words, k-grams or allcontiguous subsequences. As realizations of the framework we providelinear-time algorithms of different complexity and capabilities usingsorted arrays, tries and suffix trees as underlying data structures.Experiments on data sets from bioinformatics, text processing andcomputer security illustrate the efficiency of the proposedalgorithms—enabling peak performances of up to 106pairwise comparisons per second. The utility of distances andnon-metric similarity measures for sequences as alternatives to stringkernels is demonstrated in applications of text categorization,network intrusion detection and transcription site recognition in DNA.

シーケンスの効率的で表現力豊かな比較は、シーケンシャルデータを使用した学習に不可欠な手順です。この記事では、さまざまなカーネル、距離、非メトリック類似性関数をカバーする、シーケンスの類似性尺度の計算のための一般的なフレームワークを提案します。比較の基礎は、一連の自然語、k-gram、またはすべての連続するサブシーケンスなどの形式言語を使用したシーケンスの埋め込みです。フレームワークの実現として、ソートされた配列、トライ、サフィックスツリーを基礎データ構造として使用する、さまざまな複雑さと機能の線形時間アルゴリズムを提供します。バイオインフォマティクス、テキスト処理、コンピューターセキュリティのデータセットでの実験により、提案されたアルゴリズムの効率が示され、1秒あたり最大106のペア比較というピークパフォーマンスが可能になります。文字列カーネルの代替としてシーケンスの距離と非メトリック類似性尺度の有用性は、テキスト分類、ネットワーク侵入検出、DNAの転写部位認識のアプリケーションで実証されています。

Max-margin Classification of Data with Absent Features
特徴が存在しないデータの最大マージン分類

We consider the problem of learning classifiers in structured domains,where some objects have a subset of features that are inherentlyabsent due to complex relationships between the features. Unlike thecase where a feature exists but its value is not observed, here wefocus on the case where a feature may not even exist (structurallyabsent) for some of the samples. The common approach for handlingmissing features in discriminative models is to first complete theirunknown values, and then use a standard classification procedure overthe completed data. This paper focuses on features that are known tobe non-existing, rather than have an unknown value. We show howincomplete data can be classified directly without anycompletion of the missing features using a max-margin learningframework. We formulate an objective function, based on the geometricinterpretation of the margin, that aims to maximize the margin of eachsample in its own relevant subspace. In this formulation, the linearlyseparable case can be transformed into a binary search over a seriesof second order cone programs (SOCP), a convex problem that can besolved efficiently. We also describe two approaches for optimizing thegeneral case: an approximation that can be solved as a standardquadratic program (QP) and an iterative approach for solving the exactproblem. By avoiding the pre-processing phase in which the data iscompleted, both of these approaches could offer considerablecomputational savings. More importantly, we show that the eleganthandling of missing values by our approach allows it to bothoutperform other methods when the missing values have non-trivialstructure, and be competitive with other methods when the values aremissing at random. We demonstrate our results on several standardbenchmarks and two real-world problems: edge prediction in metabolicpathways, and automobile detection in natural images.

私たちは、構造化ドメインにおける分類器の学習の問題を検討します。構造化ドメインでは、一部のオブジェクトには、特徴間の複雑な関係により、本来存在しない特徴のサブセットがあります。特徴は存在するがその値が観察されない場合とは異なり、ここでは、一部のサンプルに対して特徴が存在しない(構造的に存在しない)可能性がある場合に焦点を当てます。識別モデルで欠落した特徴を処理するための一般的な方法は、最初に未知の値を完成させ、次に完成したデータに対して標準的な分類手順を使用することです。この論文では、未知の値ではなく、存在しないことがわかっている特徴に焦点を当てます。最大マージン学習フレームワークを使用して、欠落した特徴を完成させることなく、不完全なデータを直接分類する方法を示します。マージンの幾何学的解釈に基づいて、各サンプルのマージンをその関連するサブスペースで最大化することを目的とした目的関数を定式化します。この定式化では、線形分離可能なケースを、効率的に解決できる凸問題である一連の2次円錐計画(SOCP)に対するバイナリ検索に変換できます。また、一般的なケースを最適化するための2つのアプローチについても説明します。1つは、標準的な二次計画法(QP)として解ける近似法、もう1つは正確な問題を解くための反復アプローチです。これらのアプローチは、データを完成させる前処理フェーズを回避することで、どちらもかなりの計算コストを削減できます。さらに重要なことは、欠損値を巧みに処理することで、欠損値が自明でない構造を持つ場合に他の方法よりも優れたパフォーマンスを発揮し、値がランダムに欠損している場合には他の方法と競合できることです。いくつかの標準ベンチマークと、代謝経路のエッジ予測と自然画像での自動車検出という2つの実際の問題で結果を示します。

Journal of Machine Learning Research Papers: Volume 9の論文一覧

こちらもおすすめ

Journal of Machine Learning Research Papers: Volume 18の論文一覧

Journal of Machine Learning Research Papers: Volume 5の論文一覧

Journal of Machine Learning Research Papers: Volume 15の論文一覧