Journal of Machine Learning Research Papers: Volume 4の論文一覧

Journal of Machine Learning Research Papers Volume 4に記載されている内容を一覧にまとめ、機械翻訳を交えて日本語化し掲載します。
A Generative Model for Separating Illumination and Reflectance from Images
画像から照明と反射率を分離するための生成モデル

It is well known that even slight changes in nonuniform illuminationlead to a large image variability and are crucial formany visual tasks. This paper presents a new ICA relatedprobabilistic model where the number of sources exceeds the number ofsensors to perform an image segmentation and illumination removal, simultaneously.We model illumination and reflectance in log spaceby a generalized autoregressive process and Hidden Gaussian Markov random field, respectively.The model ability to deal with segmentation of illuminated imagesis compared with a Canny edge detector and homomorphic filtering.We apply the model to two problems: synthetic image segmentation andsea surface pollution detection from intensity images.

不均一な照明のわずかな変化でさえ、大きな画像のばらつきを引き起こし、非常に重要な形態の視覚課題であることはよく知られています。この論文では、ソースの数がセンサーの数を超え、画像のセグメンテーションと照明の除去を同時に実行する新しいICA関連の確率モデルを紹介します。対数空間の照明と反射率を、それぞれ一般化自己回帰過程と隠れガウスマルコフランダム場によってモデル化します。照らされた画像のセグメンテーションを処理するモデル能力は、キャニーエッジ検出器および準同型フィルタリングと比較されます。このモデルを、合成画像のセグメンテーションと強度画像からの海面汚染検出の2つの問題に適用します。

ICA for Watermarking Digital Images
デジタル画像の透かしを入れるためのICA

We present a domain-independent ICA-based approach to watermarking. This approach can be used on images, music or video to embed either a robust or fragile watermark. In the case of robust watermarking, the method shows high information rate and robustness against malicious and non-malicious attacks, while keeping a low induced distortion. The fragile watermarking scheme, on the other hand, shows high sensitivity to tampering attempts while keeping the requirement for high information rate and low distortion. The improved performance is achieved by employing a set of statistically independent sources (the independent components) as the feature space and principled statistical decoding methods. The performance of the suggested method is compared to other state of the art approaches. The paper focuses on applying the method to digitized images although the same approach can be used for other media, such as music or video.

私たちは、ドメインに依存しないICAベースのウォーターマークへのアプローチを紹介します。このアプローチは、画像、音楽、またはビデオで使用して、堅牢な透かしまたは壊れやすい透かしを埋め込むことができます。ロバストなウォーターマークの場合、この手法は、誘起歪みを低く抑えながら、悪意のある攻撃と非悪意のある攻撃に対して高い情報率と堅牢性を示します。一方、壊れやすい透かし入れ方式は、高い情報速度と低歪みの要件を維持しながら、改ざんの試みに対して高い感度を示します。パフォーマンスの向上は、特徴空間として統計的に独立したソース(独立コンポーネント)のセットと原則的な統計的デコード方法を採用することによって達成されます。提案された方法の性能は、他の最先端のアプローチと比較されます。この論文では、デジタル化された画像にこの方法を適用することに焦点を当てていますが、音楽やビデオなどの他のメディアにも同じアプローチを使用できます。

Overlearning in Marginal Distribution-Based ICA: Analysis and Solutions
周辺分布ベースICAにおける過剰学習:分析と解法

The present paper is written as a word of caution, with users of independent component analysis (ICA) in mind, to overlearning phenomena that are often observed. We consider two types of overlearning, typical to high-order statistics based ICA. These algorithms can be seen to maximise the negentropy of the source estimates. The first kind of overlearning results in the generation of spike-like signals, if there are not enough samples in the data or there is a considerable amount of noise present. It is argued that, if the data has power spectrum characterised by 1/f curve, we face a more severe problem, which cannot be solved inside the strict ICA model. This overlearning is better characterised by bumps instead of spikes. Both overlearning types are demonstrated in the case of artificial signals as well as magnetoencephalograms (MEG). Several methods are suggested to circumvent both types, either by making the estimation of the ICA model more robust or by including further modelling of the data.

この論文では、独立成分分析(ICA)のユーザーを念頭に置き、頻繁に観察される過学習現象に対する注意を促すために書かれたものです。高次統計に基づくICAに典型的な2種類の過学習を検討します。これらのアルゴリズムは、ソース推定値のネゲントロピーを最大化すると見ることができます。最初の種類の過学習では、データに十分なサンプルがない場合、または相当量のノイズが存在する場合に、スパイクのような信号が生成されます。データのパワースペクトルが1/f曲線で特徴付けられる場合、厳密なICAモデルでは解決できない、より深刻な問題に直面すると主張されています。この過学習は、スパイクではなくバンプで特徴付けられる方が適しています。両方のタイプの過学習は、人工信号と脳磁図(MEG)の場合に実証されています。ICAモデルの推定をより堅牢にするか、データのさらなるモデリングを含めることによって、両方のタイプを回避するためのいくつかの方法が提案されています。

Blind Source Recovery: A Framework in the State Space
ブラインドソース回復: 状態空間のフレームワーク

Blind Source Recovery (BSR) denotes recovery of originalsources/signals from environments that may include convolution,temporal variation, and even nonlinearity. It also infers therecovery of sources even in the absence of precise environmentidentifiability. This paper describes, in a comprehensive fashion, ageneralized BSR formulation achieved by the application of stochasticoptimization principles to the Kullback-Liebler divergence as aperformance functional subject to the constraints of the general(i.e., nonlinear and time-varying) state space representation. Thistechnique is used to derive update laws for nonlinear time-varyingdynamical systems, which are subsequently specialized totime-invariant and linear systems. Further, the state space demixingnetwork structures have been exploited to develop learning rules,capable of handling most filtering paradigms, which can beconveniently extended to nonlinear models. In the special cases,distinct linear state-space algorithms are presented for the minimumphase and non-minimum phase mixing environment models. Conventional(FIR/IIR) filtering models are subsequently derived from this generalstructure and are compared with material in the recent literature.Illustrative simulation examples are presented to demonstrate theonline adaptation capabilities of the developed algorithms.Some of this reported work has also been implemented in dedicatedhardware/software platforms.

ブラインドソースリカバリ(BSR)は、畳み込み、時間的変化、さらには非線形性を含む可能性のある環境から元のソース/信号をリカバリすることを意味します。また、正確な環境識別可能性がない場合でも、ソースのリカバリを推測します。この論文では、一般的な(つまり、非線形で時間変化する)状態空間表現の制約に従うパフォーマンス関数として、Kullback-Lieblerダイバージェンスに確率的最適化原理を適用することによって達成される一般化されたBSR定式化について包括的に説明します。この手法は、非線形の時間変化する動的システムの更新法則を導出するために使用され、その後、時間不変で線形のシステムに特化されます。さらに、状態空間の分離ネットワーク構造は、ほとんどのフィルタリングパラダイムを処理できる学習ルールの開発に活用されており、これは非線形モデルに簡単に拡張できます。特殊なケースでは、最小位相および非最小位相混合環境モデルに対して、異なる線形状態空間アルゴリズムが提示されます。従来の(FIR/IIR)フィルタリングモデルは、この一般的な構造から派生し、最近の文献の資料と比較されます。開発されたアルゴリズムのオンライン適応機能を示すために、例示的なシミュレーション例が提示されます。報告された作業の一部は、専用のハードウェア/ソフトウェアプラットフォームにも実装されています。

Statistical Dynamics of On-line Independent Component Analysis
オンライン独立成分分析の統計ダイナミクス

The learning dynamics of on-line independent component analysis is analysed in the limit of large data dimension. We study a simple Hebbian learning algorithm that can be used to separate out a small number of non-Gaussian components from a high-dimensional data set. The de-mixing matrix parameters are confined to a Stiefel manifold of tall, orthogonal matrices and we introduce a natural gradient variant of the algorithm which is appropriate to learning on this manifold. For large input dimension the parameter trajectory of both algorithms passes through a sequence of unstable fixed points, each described by a diffusion process in a polynomial potential. Choosing the learning rate too large increases the escape time from each of these fixed points, effectively trapping the learning in a sub-optimal state. In order to avoid these trapping states a very low learning rate must be chosen during the learning transient, resulting in learning time-scales of O(N2) or O(N3) iterations where N is the data dimension. Escape from each sub-optimal state results in a sequence of symmetry breaking events as the algorithm learns each source in turn. This is in marked contrast to the learning dynamics displayed by related on-line learning algorithms for multilayer neural networks and principal component analysis. Although the natural gradient variant of the algorithm has nice asymptotic convergence properties, it has an equivalent transient dynamics to the standard Hebbian algorithm.

オンライン独立成分分析の学習ダイナミクスを、大規模データ次元の限界で分析します。高次元データセットから少数の非ガウス成分を分離するために使用できる単純なヘブビアン学習アルゴリズムを検討します。分離行列パラメータは、縦長の直交行列のStiefel多様体に限定され、この多様体での学習に適したアルゴリズムの自然勾配バリアントを導入します。入力次元が大きい場合、両方のアルゴリズムのパラメータ軌跡は、それぞれが多項式ポテンシャルの拡散プロセスで記述される一連の不安定な固定点を通過します。学習率をあまり大きく選択すると、これらの固定点からの脱出時間が長くなり、学習が準最適状態に効果的に閉じ込められます。これらの閉じ込め状態を回避するには、学習過渡時に非常に低い学習率を選択する必要があり、その結果、学習時間スケールはO(N2)またはO(N3)反復になります(Nはデータ次元)。各最適でない状態から脱出すると、アルゴリズムが各ソースを順に学習するにつれて、一連の対称性破壊イベントが発生します。これは、多層ニューラルネットワークや主成分分析の関連オンライン学習アルゴリズムによって示される学習ダイナミクスとは著しく対照的です。アルゴリズムの自然勾配バリアントは、優れた漸近収束特性を備えていますが、標準的なヘブビアンアルゴリズムと同等の過渡ダイナミクスを備えています。

A Maximum Likelihood Approach to Single-channel Source Separation
シングルチャネルソース分離への最尤法アプローチ

This paper presents a new technique for achieving blind signalseparation when given only a single channel recording. Themain concept is based on exploiting a priori sets oftime-domain basis functions learned by independent componentanalysis (ICA) to the separation of mixed source signalsobserved in a single channel. The inherent time structure ofsound sources is reflected in the ICA basis functions, whichencode the sources in a statistically efficient manner. Wederive a learning algorithm using a maximum likelihoodapproach given the observed single channel data and sets ofbasis functions. For each time point we infer the sourceparameters and their contribution factors. This inference ispossible due to prior knowledge of the basis functions and theassociated coefficient densities. A flexible model for densityestimation allows accurate modeling of the observation and ourexperimental results exhibit a high level of separationperformance for simulated mixtures as well as real environmentrecordings employing mixtures of two different sources.

この論文では、単一チャネル録音のみが与えられた場合にブラインド信号分離を実現する新しい手法を紹介します。主な概念は、独立成分分析(ICA)によって学習された時間領域基底関数の事前セットを、単一チャネルで観測された混合音源信号の分離に利用することに基づいています。音源の固有の時間構造はICA基底関数に反映され、統計的に効率的な方法で音源をエンコードします。観測された単一チャネルデータと基底関数のセットが与えられた場合、最大尤度アプローチを使用して学習アルゴリズムを導出します。各時点について、音源パラメータとその寄与因子を推測します。この推測は、基底関数と関連する係数密度に関する事前知識があるために可能です。密度推定の柔軟なモデルにより、観測を正確にモデル化でき、実験結果では、シミュレートされた混合物だけでなく、2つの異なる音源の混合物を使用した実際の環境録音に対しても、高いレベルの分離性能が示されています。

A Multiscale Framework For Blind Separation of Linearly Mixed Signals
線形混合信号のブラインド分離のためのマルチスケールフレームワーク

We consider the problem of blind separation of unknown source signals or images from a given set of their linear mixtures. It was discovered recently that exploiting the sparsity of sources and their mixtures, once they are projected onto a proper space of sparse representation, improves the quality of separation. In this study we take advantage of the properties of multiscale transforms, such as wavelet packets, to decompose signals into sets of local features with various degrees of sparsity. We then study how the separation error is affected by the sparsity of decomposition coefficients, and by the misfit between the probabilistic model of these coefficients and their actual distribution. Our error estimator, based on the Taylor expansion of the quasi-ML function, is used in selection of the best subsets of coefficients and utilized, in turn, in further separation. The performance of the algorithm is evaluated by using noise-free and noisy data. Experiments with simulated signals, musical sounds and images, demonstrate significant improvement of separation quality over previously reported results.

私たちは、未知のソース信号または画像を、それらの線形混合の所定のセットからブラインド分離する問題について検討します。最近、ソースとその混合のスパース性を利用すると、それらを適切なスパース表現の空間に投影すると、分離の品質が向上することが発見されました。この研究では、ウェーブレットパケットなどのマルチスケール変換の特性を利用して、信号をさまざまな程度のスパース性を持つローカルフィーチャのセットに分解します。次に、分解係数のスパース性、およびこれらの係数の確率モデルと実際の分布の間の不適合によって分離エラーがどのように影響を受けるかを調べます。準ML関数のテイラー展開に基づくエラー推定器は、係数の最適なサブセットの選択に使用され、次に、さらなる分離に使用されます。アルゴリズムのパフォーマンスは、ノイズのないデータとノイズのあるデータを使用して評価されます。シミュレートされた信号、楽音、画像を使用した実験では、以前に報告された結果よりも分離品質が大幅に向上していることが実証されています。

Blind Separation of Post-nonlinear Mixtures using Linearizing Transformations and Temporal Decorrelation
線形化変換と時間的非相関を用いたポスト非線形混合のブラインド分離

We propose two methods that reduce the post-nonlinear blind source separation problem (PNL-BSS) to a linear BSS problem. The first method is based on the concept of maximal correlation: we apply the alternating conditional expectation (ACE) algorithm—a powerful technique from non-parametric statistics—to approximately invert the componentwise non-linear functions. The second method is a Gaussianizing transformation, which is motivated by the fact that linearly mixed signals before nonlinear transformation are approximately Gaussian distributed. This heuristic, but simple and efficient procedure works as good as the ACE method. Using the framework provided by ACE, convergence can be proven. The optimal transformations obtained by ACE coincide with the sought-after inverse functions of the nonlinearities. After equalizing the nonlinearities, temporal decorrelation separation (TDSEP) allows us to recover the source signals. Numerical simulations testing “ACE-TD” and “Gauss-TD” on realistic examples are performed with excellent results.

私たちは、ポスト非線形ブラインドソース分離問題(PNL-BSS)を線形BSS問題に縮小する2つの方法を提案します。最初の方法は、最大相関の概念に基づいています。交互条件付き期待値(ACE)アルゴリズム(非パラメトリック統計の強力な手法)を適用して、成分ごとの非線形関数を近似的に反転します。2番目の方法は、ガウス化変換です。これは、非線形変換前の線形混合信号が近似的にガウス分布しているという事実に基づいています。このヒューリスティックですが、シンプルで効率的な手順は、ACE方法と同様に機能します。ACEによって提供されるフレームワークを使用すると、収束を証明できます。ACEによって取得される最適な変換は、非線形性の求められている逆関数と一致します。非線形性を均等化した後、時間的非相関分離(TDSEP)によってソース信号を復元できます。実際の例で「ACE-TD」と「Gauss-TD」をテストする数値シミュレーションが実行され、優れた結果が得られました。

MISEP — Linear and Nonlinear ICA Based on Mutual Information
MISEP — 相互情報量に基づく線形および非線形ICA

Linear Independent Components Analysis (ICA) has become animportant signal processing and data analysis technique, thetypical application being blind source separation in a wide rangeof signals, such as biomedical, acoustical and astrophysical ones.Nonlinear ICA is less developed, but has the potential to becomeat least as powerful.This paper presents MISEP, an ICA technique for linear andnonlinear mixtures, which is based on the minimization of themutual information of the estimated components. MISEP is ageneralization of the popular INFOMAX technique, which is extendedin two ways: (1) to deal with nonlinear mixtures, and (2) to beable to adapt to the actual statistical distributions of thesources, by dynamically estimating the nonlinearities to be usedat the outputs. The resulting MISEP method optimizes a networkwith a specialized architecture, with a single objective function:the output entropy.The paper also briefly discusses the issue of nonlinear sourceseparation. Examples of linear and nonlinear source separationperformed by MISEP are presented.

線形独立成分分析(ICA)は、重要な信号処理およびデータ分析手法となり、その典型的な用途は、生物医学、音響、天体物理学などの幅広い信号におけるブラインドソース分離です。非線形ICAはあまり開発されていませんが、少なくとも同等に強力になる可能性があります。この論文では、推定されたコンポーネントの相互情報の最小化に基づく、線形および非線形混合用のICA手法であるMISEPについて説明します。MISEPは、一般的なINFOMAX手法を一般化したものであって、次の2つの方法で拡張されています。(1)非線形混合を処理するため、および(2)出力で使用される非線形性を動的に推定することにより、ソースの実際の統計分布に適応できるようにするため。結果として得られるMISEPメソッドは、出力エントロピーという単一の目的関数を持つ、特殊なアーキテクチャを持つネットワークを最適化します。この論文では、非線形ソース分離の問題についても簡単に説明します。MISEPによって実行される線形および非線形ソース分離の例を示します。

ICA Using Spacings Estimates of Entropy
エントロピーの間隔推定を使用したICA

This paper presents a new algorithm for the independent componentsanalysis (ICA) problem based on an efficient entropy estimator. Likemany previous methods, this algorithm directly minimizes the measureof departure from independence according to the estimatedKullback-Leibler divergence between the joint distribution and theproduct of the marginal distributions. We pair this approach withefficient entropy estimators from the statistics literature. Inparticular, the entropy estimator we use is consistent and exhibitsrapid convergence. The algorithm based on this estimator is simple,computationally efficient, intuitively appealing, and outperformsother well known algorithms. In addition, the estimator’s relativeinsensitivity to outliers translates into superior performance by ourICA algorithm on outlier tests. We present favorable comparisons tothe Kernel ICA, FAST-ICA, JADE, and extended Infomax algorithms inextensive simulations. We also provide public domain source code forour algorithms.

この論文では、効率的なエントロピー推定量に基づく独立成分分析(ICA)問題に対する新しいアルゴリズムを紹介します。以前の多くの方法と同様に、このアルゴリズムは、結合分布と周辺分布の積の間の推定されたカルバック・ライブラー・ダイバージェンスに従って、独立性からの逸脱の尺度を直接最小化します。このアプローチを、統計文献の効率的なエントロピー推定量と組み合わせます。特に、使用するエントロピー推定量は一貫性があり、収束が速いです。この推定量に基づくアルゴリズムはシンプルで、計算効率が高く、直感的に魅力的で、他のよく知られているアルゴリズムよりも優れています。さらに、推定量が外れ値に対して比較的鈍感であるため、外れ値テストでのICAアルゴリズムのパフォーマンスが優れています。広範なシミュレーションで、カーネルICA、FAST-ICA、JADE、および拡張Infomaxアルゴリズムとの良好な比較を示します。また、アルゴリズムのパブリックドメインソースコードも提供します。

Blind Source Separation via Generalized Eigenvalue Decomposition
一般化固有値分解によるブラインドソース分離

In this short note we highlight the fact that linear blind source separation can be formulated as a generalized eigenvalue decomposition under the assumptions of non-Gaussian, non-stationary, or non-white independent sources. The solution for the unmixing matrix is given by the generalized eigenvectors that simultaneously diagonalize the covariance matrix of the observations and an additional symmetric matrix whose form depends upon the particular assumptions. The method critically determines the mixture coefficients and is therefore not robust to estimation errors. However it provides a rather general and unified solution that summarizes the conditions for successful blind source separation. To demonstrate the method, which can be implemented in two lines of matlab code, we present results for artificial mixtures of speech and real mixtures of electroencephalography (EEG) data, showing that the same sources are recovered under the various assumptions.

この短いノートでは、線形ブラインドソース分離は、非ガウス、非定常、または非白の独立したソースの仮定の下で、一般化された固有値分解として定式化できるという事実を強調します。アンミキシング行列の解は、観測値の共分散行列を同時に対角化する一般化固有ベクトルと、特定の仮定に依存する形式を持つ追加の対称行列によって与えられます。この手法は混合係数を決定的に決定するため、推定誤差に対してロバストではありません。ただし、ブラインドイオン源分離を成功させるための条件をまとめた、かなり一般的で統一されたソリューションを提供します。2行のMATLABコードで実装できるこの方法を実証するために、音声の人工的な混合物と実際の脳波(EEG)データの混合物の結果を示し、さまざまな仮定の下で同じソースが回復されることを示します。

Energy-Based Models for Sparse Overcomplete Representations
スパース過完全表現のためのエネルギーベースモデル

We present a new way of extending independent components analysis(ICA) to overcomplete representations. In contrast to the causalgenerative extensions of ICA which maintain marginal independence ofsources, we define features as deterministic (linear)functions of the inputs. This assumption results in marginaldependencies among the features, but conditionalindependence of the features given the inputs. By assigning energiesto the features a probability distribution over the input states isdefined through the Boltzmann distribution. Free parameters of thismodel are trained using the contrastive divergence objective (Hinton, 2002). When the number of features is equal to the numberof input dimensions this energy-based model reduces to noiseless ICAand we show experimentally that the proposed learning algorithm isable to perform blind source separation on speech data. In additionalexperiments we train overcomplete energy-based models to extractfeatures from various standard data-sets containing speech, naturalimages, hand-written digits and faces.

私たちは、独立成分分析(ICA)を過剰表現に拡張する新しい方法を紹介します。ソースの限界独立性を維持するICAの因果生成拡張とは対照的に、特徴を入力の決定論的(線形)関数として定義します。この仮定により、特徴間に限界依存性が生じますが、入力が与えられた場合の特徴の条件付き独立性が生じます。特徴にエネルギーを割り当てることにより、ボルツマン分布を通じて入力状態にわたる確率分布が定義されます。このモデルの自由パラメータは、対照的発散目的(Hinton、2002)を使用してトレーニングされます。特徴の数が入力次元の数に等しい場合、このエネルギーベースのモデルはノイズのないICAに縮小され、提案された学習アルゴリズムが音声データに対してブラインドソース分離を実行できることを実験的に示します。追加の実験では、過剰完全なエネルギーベースのモデルをトレーニングして、音声、自然画像、手書きの数字、顔を含むさまざまな標準データセットから特徴を抽出します。

Beyond Independent Components: Trees and Clusters
独立したコンポーネントを超えて:ツリーとクラスター

We present a generalization of independent component analysis(ICA), where instead of looking for a linear transform that makesthe data components independent, we look for a transform thatmakes the data components well fit by a tree-structured graphicalmodel. This tree-dependent component analysis (TCA)provides a tractable and flexible approach to weakening theassumption of independence in ICA. In particular, TCA allows theunderlying graph to have multiple connected components, and thusthe method is able to find “clusters” of components such thatcomponents are dependent within a cluster and independent betweenclusters. Finally, we make use of a notion of graphical modelsfor time series due to Brillinger (1996) to extend these ideasto the temporal setting. In particular, we are able to fit modelsthat incorporate tree-structured dependencies among multiple timeseries.

私たちは、独立成分分析(ICA)の一般化を提示し、データ成分を独立させる線形変換を探す代わりに、データ成分をツリー構造のグラフィカルモデルに適切に適合させる変換を探します。このツリー依存成分分析(TCA)は、ICAの独立性の仮定を弱めるための扱いやすく柔軟なアプローチを提供します。特に、TCAでは、基礎となるグラフに複数の接続されたコンポーネントを含めることができるため、この方法では、コンポーネントがクラスター内で依存し、クラスター間で独立しているようなコンポーネントの「クラスター」を見つけることができます。最後に、Brillinger(1996)による時系列のグラフィカルモデルの概念を利用して、これらのアイデアを時間設定に拡張します。特に、複数の時系列間のツリー構造の依存関係を組み込んだモデルを適合させることができます。

Dependence, Correlation and Gaussianity in Independent Component Analysis
独立成分分析における依存性,相関性,ガウス性

Independent component analysis (ICA) is the decomposition of a random vector in linear components which are “as independent as possible.” Here, “independence” should be understood in its strong statistical sense: it goes beyond (second-order) decorrelation and thus involves the non-Gaussianity of the data. The ideal measure of independence is the “mutual information” and is known to be related to the entropy of the components when the search for components is restricted to uncorrelated components. This paper explores the connections between mutual information, entropy and non-Gaussianity in a larger framework, without resorting to a somewhat arbitrary decorrelation constraint. A key result is that the mutual information can be decomposed, under linear transforms, as the sum of two terms: one term expressing the decorrelation of the components and one expressing their non-Gaussianity. Our results extend the previous understanding of these connections and explain them in the light of information geometry. We also describe the “local geometry” of ICA by re-expressing all our results via a Gram-Charlier expansion by which all quantities of interest are obtained in terms of cumulants.

独立成分分析(ICA)は、ランダムベクトルを「可能な限り独立した」線形成分に分解するものです。ここで、「独立性」は、強い統計的意味で理解する必要があります。つまり、(2次)非相関性を超え、データの非ガウス性に関係します。独立性の理想的な尺度は「相互情報量」であり、成分の検索が非相関成分に限定されている場合、成分のエントロピーと関係することが知られています。この論文では、やや恣意的な非相関性制約に頼ることなく、相互情報量、エントロピー、非ガウス性の関係をより大きな枠組みで探ります。重要な結果は、線形変換の下で、相互情報量を2つの項の合計として分解できることです。1つは成分の非相関性を表す項で、もう1つは成分の非ガウス性を表します。私たちの結果は、これらの関係についてのこれまでの理解を拡張し、情報幾何学の観点から説明します。また、すべての関心のある量をキュムラントの観点から取得するグラム・シャルリエ展開を介してすべての結果を再表現することにより、ICAの「ローカルジオメトリ」を説明します。

Introduction to Special Issue on Independent Components Analysis
独立成分分析特集の紹介

An Approximate Analytical Approach to Resampling Averages
平均を再サンプリングするための近似分析アプローチ

Using a novel reformulation, we develop aframework to compute approximate resampling data averagesanalytically. The method avoids multiple retraining of statistical modelson the samples. Our approach uses a combination ofthe replica “trick” of statistical physics and the TAP approach forapproximateBayesian inference. We demonstrate our approach on regression with Gaussianprocesses. A comparison with averages obtained by Monte-Carlosampling shows that our method achieves good accuracy.

新しい再定式化を使用して、近似リサンプリングデータを平均的に分析的に計算するためのフレームワークを開発します。この方法では、サンプルに対する統計モデルの複数の再トレーニングが回避されます。私たちのアプローチは、統計物理学のレプリカ「トリック」と近似ベイズ推論のTAPアプローチの組み合わせを使用します。ガウス過程による回帰のアプローチを示します。Monte-Carlosamplingによって得られた平均と比較すると、私たちの方法は良好な精度を達成していることがわかります。

Least-Squares Policy Iteration
最小二乗法の反復

We propose a new approach to reinforcement learning for controlproblems which combines value-function approximation with lineararchitectures and approximate policy iteration. This new approach ismotivated by the least-squares temporal-difference learning algorithm(LSTD) for prediction problems, which is known for its efficient useof sample experiences compared to pure temporal-differencealgorithms. Heretofore, LSTD has not had a straightforward applicationto control problems mainly because LSTD learns the state valuefunction of a fixed policy which cannot be used for action selectionand control without a model of the underlying process. Our newalgorithm, least-squares policy iteration (LSPI), learns thestate-action value function which allows for action selection withouta model and for incremental policy improvement within apolicy-iteration framework. LSPI is a model-free, off-policy methodwhich can use efficiently (and reuse in each iteration) sampleexperiences collected in any manner. By separating thesample collection method, the choice of the linear approximationarchitecture, and the solution method, LSPI allows for focusedattention on the distinct elements that contribute to practicalreinforcement learning. LSPI is tested on the simple task ofbalancing an inverted pendulum and the harder task of balancing andriding a bicycle to a target location. In both cases, LSPI learns tocontrol the pendulum or the bicycle by merely observing a relativelysmall number of trials where actions are selected randomly. LSPI isalso compared against Q-learning (both with and without experiencereplay) using the same value function architecture. While LSPIachieves good performance fairly consistently on the difficult bicycletask, Q-learning variants were rarely able to balance for more thana small fraction of the time needed to reach the target location.

私たちは、制御問題に対する強化学習の新しいアプローチを提案します。これは、値関数近似と線形アーキテクチャおよび近似ポリシー反復を組み合わせたものです。この新しいアプローチは、純粋な時間差分アルゴリズムと比較してサンプルエクスペリエンスを効率的に使用することで知られる、予測問題に対する最小二乗時間差分学習アルゴリズム(LSTD)に触発されています。これまで、LSTDは制御問題に簡単には適用できませんでした。これは主に、LSTDが固定ポリシーの状態値関数を学習するため、基礎となるプロセスのモデルなしではアクションの選択や制御に使用できないためです。私たちの新しいアルゴリズムである最小二乗ポリシー反復(LSPI)は、状態アクション値関数を学習します。これにより、モデルなしでアクションを選択でき、ポリシー反復フレームワーク内でポリシーを段階的に改善できます。LSPIは、モデルフリーのオフポリシーメソッドであり、任意の方法で収集されたサンプルエクスペリエンスを効率的に使用(および各反復で再利用)できます。LSPIでは、サンプル収集方法、線形近似アーキテクチャの選択、およびソリューションメソッドを分離することで、実用的な強化学習に寄与する個別の要素に焦点を絞ることができます。LSPIは、倒立振り子のバランスをとるという単純なタスクと、自転車のバランスをとりながら目標地点まで走るというより難しいタスクでテストされます。どちらの場合も、LSPIは、アクションがランダムに選択される比較的少数の試行を観察するだけで、振り子または自転車の制御を学習します。LSPIは、同じ価値関数アーキテクチャを使用して、Q学習(経験再生ありとなしの両方)とも比較されます。LSPIは難しい自転車タスクでかなり一貫して優れたパフォーマンスを達成しますが、Q学習バリアントは、目標地点に到達するのに必要な時間のごく一部以上バランスをとることはほとんどありませんでした。

Sparseness of Support Vector Machines
サポートベクターマシンのスパース性

Support vector machines (SVMs) construct decision functions that are linear combinationsof kernel evaluations on the training set. The samples with non-vanishing coefficientsare called support vectors. In this work we establish lower (asymptotical)bounds on the number of support vectors. On our way we prove several resultswhich are of great importance for the understanding of SVMs.In particular, we describe to which “limit”SVM decision functions tend, discuss the corresponding notion of convergenceand provide some results on the stability of SVMs using subdifferential calculusin the associated reproducing kernel Hilbert space.

サポートベクターマシン(SVM)は、トレーニングセットのカーネル評価の線形結合である決定関数を構築します。消失係数のないサンプルは、サポートベクトルと呼ばれます。この作業では、サポートベクトルの数の下限(漸近)境界を確立します。私たちの道では、SVMs.In特定の理解にとって非常に重要ないくつかの結果を証明し、SVM決定関数がどの「限界」に傾向があるかを説明し、対応する収束の概念を議論し、関連する再現カーネルヒルベルト空間のサブ微分計算を使用してSVMの安定性に関するいくつかの結果を提供します。

Nash Q-Learning for General-Sum Stochastic Games
Nash Q-一般和確率ゲームのための学習

We extend Q-learning to a noncooperative multiagent context, using theframework of general-sum stochastic games. A learning agent maintainsQ-functions over joint actions, and performs updates based on assumingNash equilibrium behavior over the current Q-values. This learningprotocol provably converges given certain restrictions on the stagegames (defined by Q-values) that arise during learning. Experiments with a pair of two-playergrid games suggest that such restrictions on the game structure arenot necessarily required. Stage games encountered during learning in both grid environments violate the conditions.However, learningconsistently converges in the first grid game, which has a uniqueequilibrium Q-function, but sometimes fails to converge in thesecond, which has three different equilibrium Q-functions.In a comparison of offline learning performance inboth games, we find agents are more likely to reach a joint optimalpath with Nash Q-learning than with a single-agent Q-learningmethod. When at least one agent adopts Nash Q-learning,the performance of both agents is better than using single-agentQ-learning. We have also implemented an online version of NashQ-learning that balances exploration with exploitation,yielding improved performance.

私たちは、一般和確率ゲームのフレームワークを使用して、Q学習を非協力的なマルチエージェントコンテキストに拡張します。学習エージェントは、共同アクションに対してQ関数を維持し、現在のQ値に対してナッシュ均衡動作を仮定して更新を実行します。この学習プロトコルは、学習中に発生するステージゲーム(Q値によって定義)に対する特定の制限を与えられた場合に、確実に収束します。2人のプレーヤーによるグリッドゲームのペアを使用した実験では、ゲーム構造に対するこのような制限は必ずしも必要ではないことが示唆されています。両方のグリッド環境で学習中に遭遇するステージゲームは、条件に違反しています。ただし、学習は、一意の均衡Q関数を持つ最初のグリッドゲームでは一貫して収束しますが、3つの異なる均衡Q関数を持つ2番目のグリッドゲームでは収束に失敗することがあります。両方のゲームでのオフライン学習パフォーマンスを比較すると、エージェントは、単一エージェントQ学習方法よりもナッシュQ学習で共同最適パスに到達する可能性が高いことがわかります。少なくとも1つのエージェントがNash Q学習を採用すると、両方のエージェントのパフォーマンスは単一エージェントのQ学習を使用する場合よりも向上します。また、探索と活用のバランスをとるNashQ学習のオンラインバージョンも実装し、パフォーマンスが向上しました。

A Unified Framework for Model-based Clustering
モデルベースクラスタリングのための統合フレームワーク

Model-based clustering techniques have been widely used andhave shown promising results in many applications involving complex data.This paper presents a unified framework for probabilistic model-basedclustering based on a bipartite graph view of data and modelsthat highlights the commonalities and differences among existingmodel-based clustering algorithms.In this view, clusters are represented as probabilistic models ina model space that is conceptually separate from the data space.For partitional clustering, the view is conceptually similarto the Expectation-Maximization (EM) algorithm.For hierarchical clustering, the graph-based view helps to visualizecritical/important distinctionsbetween similarity-based approaches and model-based approaches.The framework also suggests several useful variations of existingclustering algorithms.Two new variations—balanced model-based clustering and hybridmodel-based clustering—are discussed and empirically evaluatedon a variety of data types.

モデルベースのクラスタリング手法は広く使用されており、複雑なデータを含む多くのアプリケーションで有望な結果を示しています。この論文では、データとモデルの二部グラフビューに基づく確率モデルベースのクラスタリングの統一フレームワークを提示し、既存のモデルベースのクラスタリングアルゴリズムの共通点と相違点を強調します。このビューでは、クラスターは、概念的にデータ空間から分離されたモデル空間内の確率モデルとして表されます。パーティションクラスタリングの場合、ビューは概念的にはExpectation-Maximization (EM)アルゴリズムと似ています。階層クラスタリングの場合、グラフベースのビューは、類似性ベースのアプローチとモデルベースのアプローチの間の重要な区別を視覚化するのに役立ちます。このフレームワークは、既存のクラスタリングアルゴリズムのいくつかの有用なバリエーションも提案しています。バランスの取れたモデルベースのクラスタリングとハイブリッドモデルベースのクラスタリング——2つの新しいバリエーションが、さまざまなデータタイプで議論され、経験的に評価されます。

Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet
一般損失とアルファベットに対する普遍ベイズ数列予測の最適性

Various optimality properties of universal sequence predictorsbased on Bayes-mixtures in general, and Solomonoff’s predictionscheme in particular, will be studied.The probability of observing xt at time t, given pastobservations x1…xt-1 can be computed with the chain ruleif the true generating distribution μ of the sequencesx1x2x3…. is known. If μ is unknown, but known to belongto a countable or continuous class Μ one can base onesprediction on the Bayes-mixture ξ defined as awν-weighted sum or integral of distributions ν∈ Μ. Thecumulative expected loss of the Bayes-optimal universal predictionscheme based on ξ is shown to be close to the loss of theBayes-optimal, but infeasible prediction scheme based on μ. Weshow that the bounds are tight and that no other predictor canlead to significantly smaller bounds.Furthermore, for various performance measures, we showPareto-optimality of ξ and give an Occam’s razor argument thatthe choice wν &sim 2-K(ν) for the weights is optimal,where K(ν) is the length of the shortest program describingν.The results are applied to games of chance, defined as a sequenceof bets, observations, and rewards.The prediction schemes (and bounds) are compared to the popularpredictors based on expert advice.Extensions to infinite alphabets, partial, delayed andprobabilistic prediction, classification, and more active systemsare briefly discussed.

一般的なベイズ混合分布、特にソロモンオフの予測スキームに基づくユニバーサルシーケンス予測子のさまざまな最適性特性について検討します。過去の観測値x1…xt-1が与えられた場合、時刻tでxtが観測される確率は、シーケンスx1x2x3….の真の生成分布 μ がわかっている場合、連鎖律で計算できます。μ が不明だが、可算クラスまたは連続クラス Μ に属することがわかっている場合は、分布 ν∈ Μ のawν 加重和または積分として定義されるベイズ混合分布 ξ に基づいて予測を行うことができます。ξ に基づくベイズ最適なユニバーサル予測スキームの累積期待損失は、μ に基づくベイズ最適だが実行不可能な予測スキームの損失に近いことが示されています。境界が厳密であり、他の予測子ではこれより大幅に小さい境界を導くことができないことを示します。さらに、さまざまなパフォーマンス測定について、ξ のパレート最適性を示し、重みとしてwν&sim 2-K(ν)を選択することが最適であるというオッカムの剃刀の議論を示します。ここで、K(ν)は ν を記述する最短プログラムの長さです。結果は、賭け、観察、報酬のシーケンスとして定義される偶然のゲームに適用されます。予測スキーム(および境界)は、専門家のアドバイスに基づいて一般的な予測子と比較されます。無限アルファベット、部分的予測、遅延予測、確率的予測、分類、およびよりアクティブなシステムへの拡張について簡単に説明します。

An Efficient Boosting Algorithm for Combining Preferences
プリファレンスを組み合わせるための効率的なブースティングアルゴリズム

We study the problem of learning to accurately rank a set of objectsby combining a given collection of ranking or preference functions.This problem of combining preferences arises in several applications,such as that of combining the results of different search engines, orthe “collaborative-filtering” problem of ranking movies for a userbased on the movie rankings provided by other users.In this work, we begin by presenting a formal framework for thisgeneral problem.We then describe and analyze an efficient algorithm called RankBoost for combining preferences based on the boosting approach to machinelearning.We give theoretical results describing the algorithm’s behavior bothon the training data, and on new test data not seen during training.We also describe an efficient implementation of thealgorithm for a particular restricted but common case.We next discuss two experiments we carriedout to assess the performance of RankBoost. In the first experiment,we used the algorithm to combine different web search strategies, each ofwhich is a query expansion for a given domain.The second experiment is a collaborative-filtering taskfor making movie recommendations.

私たちは、与えられたランキング関数または嗜好関数の集合を組み合わせることによって、一連のオブジェクトを正確にランク付けする方法を学習する問題を研究します。嗜好を組み合わせるこの問題は、さまざまな検索エンジンの結果を組み合わせる問題や、他のユーザーが提供した映画のランキングに基づいてユーザーのために映画をランク付けする「協調フィルタリング」問題など、いくつかのアプリケーションで発生します。この研究では、まずこの一般的な問題の正式なフレームワークを提示します。次に、機械学習のブースティングアプローチに基づいて嗜好を組み合わせるためのRankBoostと呼ばれる効率的なアルゴリズムについて説明し、分析します。トレーニングデータと、トレーニング中に見られなかった新しいテストデータの両方でのアルゴリズムの動作を説明する理論的な結果を示します。また、特定の制限付きだが一般的なケースに対するアルゴリズムの効率的な実装についても説明します。次に、RankBoostのパフォーマンスを評価するために実行した2つの実験について説明します。最初の実験では、アルゴリズムを使用して、それぞれが特定のドメインのクエリ拡張であるさまざまなWeb検索戦略を組み合わせました。2番目の実験は、映画の推奨を行うための協調フィルタリングタスクです。

Learning over Sets using Kernel Principal Angles
カーネル主角度を使用したセットの学習

We consider the problem of learning with instances defined over a space ofsets of vectors. We derive a new positive definite kernel f(A,B) defined overpairs of matrices A,B based on the concept of principal angles between twolinear subspaces. We show that the principal angles can be recoveredusing only inner-products between pairs of column vectors of the inputmatrices thereby allowing the original column vectors of A,B to bemapped onto arbitrarily high-dimensional feature spaces. We demonstrate the usage of the matrix-based kernel function f(A,B)with experiments on two visual tasks. The first task is the discrimination of “irregular” motion trajectory of an individual or agroup of individuals in a video sequence. We use the SVM approachusing f(A,B) where an input matrixrepresents the motion trajectory of a group of individuals over acertain (fixed) time frame. We show that the classification(irregular versus regular) greatly outperforms theconventional representation where all the trajectories form a singlevector. The second application is the visual recognition of faces frominput video sequences representing head motion and facial expressionswhere f(A,B) is used to compare two image sequences.

私たちは、ベクトル集合の空間で定義されたインスタンスによる学習の問題を考察します。2つの線形部分空間間の主角の概念に基づいて、行列A、Bのペアで定義された新しい正定値カーネルf(A,B)を導出します。入力行列の列ベクトルのペア間の内積のみを使用して主角を復元できることを示し、これによりA、Bの元の列ベクトルを任意の高次元の特徴空間にマッピングできます。2つの視覚タスクの実験で、行列ベースのカーネル関数f(A,B)の使用方法を示します。最初のタスクは、ビデオシーケンス内の個人または個人のグループの「不規則な」動きの軌跡を識別することです。入力行列が特定の(固定)時間枠での個人のグループの動きの軌跡を表すf(A,B)を使用してSVMアプローチを使用します。私たちは、分類（不規則対規則）が、すべての軌跡が単一のベクトルを形成する従来の表現よりも大幅に優れていることを示しています。2番目のアプリケーションは、頭の動きと顔の表情を表す入力ビデオシーケンスからの顔の視覚認識です。f(A,B)は、2つの画像シーケンスを比較するために使用されます。

Concentration Inequalities for the Missing Mass and for Histogram Rule Error
欠損質量とヒストグラムルール誤差の濃度の不等式

This paper gives distribution-free concentrationinequalities for the missing mass and the error rate of histogram rules.Negative association methods can be used to reduce these concentrationproblems to concentration questions about independent sums. Althoughthe sums are independent, they are highly heterogeneous. Suchhighly heterogeneous independent sums cannot be analyzed using standardconcentration inequalities such as Hoeffding’s inequality, theAngluin-Valiant bound, Bernstein’s inequality, Bennett’s inequality,or McDiarmid’s theorem. The concentration inequality for histogram rule erroris motivated by the desire to construct a new class of bounds on the generalizationerror of decision trees.

この論文では、ヒストグラムルールの欠損質量とエラー率について、分布のない濃度の不等式を示します。負の関連付け法を使用して、これらの集中問題を独立和に関する集中の問題に減らすことができます。合計は独立していますが、非常に不均一です。このような非常に不均質な独立和は、Hoeffdingの不等式、Angluin-Valiant限界、Bernsteinの不等式、Bennettの不等式、McDiarmidの定理などの標準濃度の不等式を使用して分析することはできません。ヒストグラムルール誤差の濃度不等式は、決定木の一般化誤差に新しいクラスの境界を構築したいという願望によって動機付けられています。

On the Rate of Convergence of Regularized Boosting Classifiers
正則化ブースティング分類器の収束率について

A regularized boosting method is introduced, for which regularizationis obtained through a penalization function. It is shown throughoracle inequalities that this method is model adaptive. The rate ofconvergence of the probability of misclassification is investigated.It is shown that forquite a large class of distributions, the probability of errorconverges to the Bayes risk at a ratefaster than n-(V+2)/(4(V+1)) where V is the VC dimensionof the “base” class whose elements are combined by boosting methodsto obtain an aggregated classifier. The dimension-independent natureof the rates may partially explain the good behavior of these methodsin practical problems. Under Tsybakov’s noise condition the rate ofconvergence is even faster. We investigate the conditions necessary toobtain such rates for different base classes. The special case ofboosting using decision stumps is studied in detail. We characterizethe class of classifiers realizable by aggregating decision stumps. Itis shown that some versions of boosting work especially well inhigh-dimensional logistic additive models. It appears that adding alimited labelling noise to the training data may in certain casesimprove the convergence, as has been also suggested by other authors.

正規化ブースティング法が導入され、ペナルティ関数によって正規化が実現されます。オラクル不等式によって、この方法がモデル適応型であることが示されます。誤分類の確率の収束率を調査します。かなり大規模な分布クラスでは、エラーの確率がベイズリスクに収束する速度はn-(V+2)/(4(V+1))よりも速いことが示されます。ここで、Vは「基本」クラスのVC次元であり、その要素はブースティング法によって結合されて集約分類子を得ます。この速度の次元非依存の性質は、実際の問題におけるこれらの方法の良好な動作を部分的に説明できます。Tsybakovのノイズ条件下では、収束率はさらに速くなります。さまざまな基本クラスでこのような速度を得るために必要な条件を調査します。決定スタンプを使用したブースティングの特殊なケースを詳細に研究します。決定スタンプを集約することで実現可能な分類子のクラスを特徴付けます。ブースティングのいくつかのバージョンは、高次元ロジスティック加法モデルで特にうまく機能することが示されています。他の著者によっても示唆されているように、トレーニングデータに制限されたラベル付けノイズを追加すると、特定のケースで収束が改善される可能性があるようです。

Generalization Error Bounds for Bayesian Mixture Algorithms
ベイズ混合アルゴリズムの一般化誤差範囲

Bayesian approaches to learning and estimation have played asignificant role in the Statistics literature over many years.While they are often provably optimal in a frequentist setting,and lead to excellent performance in practical applications, therehave not been many precise characterizations of their performancefor finite sample sizes under general conditions. In this paper weconsider the class of Bayesian mixture algorithms, where anestimator is formed by constructing a data-dependent mixture oversome hypothesis space. Similarly to what is observed in practice,our results demonstrate that mixture approaches are particularlyrobust, and allow for the construction of highly complexestimators, while avoiding undesirable overfitting effects. Ourresults, while being data-dependent in nature, are insensitive tothe underlying model assumptions, and apply whether or not thesehold. At a technical level, the approach applies to unboundedfunctions, constrained only by certain moment conditions. Finally,the bounds derived can be directly applied to non-Bayesian mixtureapproaches such as Boosting and Bagging.

学習と推定に対するベイズ的アプローチは、長年にわたって統計学の文献で重要な役割を果たしてきました。ベイズ的アプローチは、頻度主義の設定では最適であることが証明されることが多く、実際のアプリケーションでは優れたパフォーマンスをもたらしますが、一般的な条件下での有限サンプルサイズのパフォーマンスを正確に特徴付けることはあまりありません。この論文では、ベイズ混合アルゴリズムのクラスを検討します。このアルゴリズムでは、推定量は、仮説空間でデータ依存の混合を構築することによって形成されます。実際に観察されるものと同様に、私たちの結果は、混合アプローチが特に堅牢であり、望ましくないオーバーフィッティング効果を回避しながら、非常に複雑な推定量を構築できることを実証しています。私たちの結果は、本質的にデータに依存しますが、基礎となるモデルの仮定には影響されず、これらの仮定が成り立つかどうかに関係なく適用されます。技術的なレベルでは、このアプローチは、特定のモーメント条件によってのみ制約される、無制限の関数に適用されます。最後に、導出された境界は、ブースティングやバギングなどの非ベイズ混合アプローチに直接適用できます。

Tracking Linear-threshold Concepts with Winnow
Winnowによる線形しきい値の概念の追跡

In this paper, we give a mistake-bound for learning arbitrary linear-threshold concepts that are allowed to change over time in the on-line model of learning. We use a variation of the Winnow algorithm and show that the bounds for learning shifting linear-threshold functions have many of the same advantages that the traditional Winnow algorithm has on fixed concepts. These benefits include a weak dependence on the number of irrelevant attributes, inexpensive runtime, and robust behavior against noise. In fact, we show that the bound for tracking Winnow has even better performance with respect to irrelevant attributes. Let X∈[0,1]n be an instance of the learning problem. In the previous bounds, the number of mistakes depends on lnn. In this paper, the shifting concept bound depends on max ln(||X||1). We show that this behavior is a result of certain parameter choices in the tracking version of Winnow, and we show how to use related parameters to get a similar mistake bound for the traditional fixed concept version of Winnow.

この論文では、オンライン学習モデルで時間の経過とともに変化することが許される任意の線形しきい値概念を学習するためのミス境界を示します。Winnowアルゴリズムのバリエーションを使用し、シフト線形しきい値関数を学習するための境界には、従来のWinnowアルゴリズムが固定概念に対して持つのと同じ利点が数多くあることを示します。これらの利点には、無関係な属性の数への弱い依存、安価な実行時間、ノイズに対する堅牢な動作が含まれます。実際、追跡Winnowの境界は、無関係な属性に関してさらに優れたパフォーマンスを発揮することを示します。X∈[0,1]nを学習問題のインスタンスとします。以前の境界では、ミスの数はlnnに依存していました。この論文では、シフト概念の境界はmax ln(||X||1)に依存します。この動作は、追跡バージョンのWinnowでの特定のパラメーター選択の結果であることを示し、関連するパラメーターを使用して、従来の固定概念バージョンのWinnowで同様のミス境界を取得する方法を示します。

Path Kernels and Multiplicative Updates
パスカーネルと乗算型更新

Kernels are typically applied to linear algorithmswhose weight vector is a linear combinationof the feature vectors of the examples.On-line versions of these algorithms are sometimescalled “additive updates” because they add a multiple of the lastfeature vector to the current weight vector.In this paper we have found a way to use special convolutionkernels to efficiently implement “multiplicative” updates.The kernels are defined by a directed graph.Each edge contributes an input.The inputs along a path form a product feature andall such products build the feature vector associatedwith the inputs.We also have a set of probabilities on the edges sothat the outflow from each vertex is one.We then discuss multiplicative updates on thesegraphs where the prediction is essentially a kernelcomputation and the update contributes a factor to each edge.After adding the factors to the edges,the total outflow out of each vertex is notone any more. However some clever algorithms re-normalizethe weights on the paths so that the totaloutflow out of each vertex is one again.Finally, we show that if the digraph is builtfrom a regular expressions, then this canbe used for speedingup the kernel and re-normalization computations.We reformulate a large number of multiplicativeupdate algorithms using path kernelsand characterize the applicability of our method.The examples includeefficient algorithms for learning disjunctionsand a recent algorithm that predicts as wellas the best pruning of a series parallel digraphs.

カーネルは、通常、重みベクトルが例の特徴ベクトルの線形結合である線形アルゴリズムに適用されます。これらのアルゴリズムのオンラインバージョンは、最後の特徴ベクトルの倍数を現在の重みベクトルに追加するため、「加法更新」と呼ばれることがあります。この論文では、特殊な畳み込みカーネルを使用して「乗法」更新を効率的に実装する方法を見つけました。カーネルは有向グラフによって定義されます。各エッジは入力を提供します。パスに沿った入力は積特徴を形成し、そのようなすべての積は、入力に関連付けられた特徴ベクトルを構築します。また、各頂点からのアウトフローが1になるように、エッジに一連の確率があります。次に、予測が基本的にカーネル計算であり、更新が各エッジに係数を提供するこれらのグラフでの乗法更新について説明します。エッジに係数を追加すると、各頂点からの合計アウトフローは1ではなくなります。ただし、一部の巧妙なアルゴリズムは、パスの重みを再正規化して、各頂点からの合計出力が再び1になるようにします。最後に、有向グラフが正規表現から構築されている場合、これを使用してカーネルと再正規化の計算を高速化できることを示します。パスカーネルを使用して多数の乗法更新アルゴリズムを再定式化し、この方法の適用可能性を特徴付けます。例には、選言を学習するための効率的なアルゴリズムや、直列並列有向グラフの最適な剪定を予測する最新のアルゴリズムが含まれます。

On the Performance of Kernel Classes
カーネルクラスのパフォーマンスについて

We present sharp bounds on the localized Rademacher averages ofthe unit ball in a reproducing kernel Hilbert space in terms ofthe eigenvalues of the integral operator associated with thekernel. We use this result to estimate the performance of theempirical minimization algorithm when the base class is the unitball of the reproducing kernel Hilbert space.

私たちは、再生カーネルヒルベルト空間における単位ボールの局在化ラデマッハー平均に、カーネルに関連付けられた積分演算子の固有値に関して鋭い境界を示します。この結果を使用して、基本クラスが再現カーネルヒルベルト空間のユニットボールである場合の経験的最小化アルゴリズムの性能を推定します。

Introduction to the Special Issue on Learning Theory
学習理論特集号の紹介

Tree-Structured Neural Decoding
ツリー構造ニューラルデコーディング

We propose adaptive testing as a general mechanism for extractinginformation about stimuli from spike trains. Each test or questioncorresponds to choosing a neuron and a time interval and checking fora given number of spikes. No assumptions are made about thedistribution of spikes or any other aspect of neural encoding. Thechosen questions are those which most reduce the uncertainty about thestimulus, as measured by entropy and estimated from stimulus-responsedata. Our experiments are based on accurate simulations of responsesto pure tones in the auditory nerve and are meant to illustrate theideas rather than investigate the auditory system. The results coherenicely with well-understood encoding of amplitude and frequency in theauditory nerve, suggesting that adaptive testing might provide apowerful tool for investigating complex and poorly understood neuralstructures.

私たちは、スパイクトレインから刺激に関する情報を抽出するための一般的なメカニズムとして、適応テストを提案します。各テストまたは質問は、ニューロンと時間間隔を選択し、特定の数のスパイクをチェックすることに対応しています。スパイクの分布やニューラルエンコーディングの他の側面については仮定は行われていません。選択された問題は、エントロピーによって測定され、刺激反応データから推定される、刺激に関する不確実性を最も軽減する問題です。私たちの実験は、聴覚神経の純粋なトーンに対する反応の正確なシミュレーションに基づいており、聴覚系を調査するのではなく、アイデアを説明することを目的としています。結果は、聴覚神経の振幅と周波数のよく理解された符号化と一貫しており、適応テストが複雑で理解が不十分な神経構造を調査するための強力なツールを提供する可能性があることを示唆しています。

Greedy Algorithms for Classification — Consistency, Convergence Rates, and Adaptivity
分類のための貪欲アルゴリズム — 一貫性、収束率、および適応性

Many regression and classification algorithms proposed over the years can be described as greedy procedures for the stagewiseminimization of an appropriate cost function. Some examplesinclude additive models, matching pursuit, and boosting. In thiswork we focus on the classification problem, for which many recentalgorithms have been proposed and applied successfully. For aspecific regularized form of greedy stagewise optimization, weprove consistency of the approach under rather general conditions.Focusing on specific classes of problems we provide conditionsunder which our greedy procedure achieves the (nearly) minimaxrate of convergence, implying that the procedure cannot beimproved in a worst case setting. We also construct a fullyadaptive procedure, which, without knowing the smoothnessparameter of the decision boundary, converges at the same rate asif the smoothness parameter were known.

長年にわたって提案されてきた多くの回帰アルゴリズムと分類アルゴリズムは、適切なコスト関数の段階的最小化のための貪欲な手順として説明できます。例としては、加法モデル、マッチング追跡、ブースティングなどがあります。この作業では、最近の多くのアルゴリズムが提案され、成功裏に適用されている分類問題に焦点を当てます。貪欲なステージごとの最適化の特定の正則化された形式については、かなり一般的な条件下でのアプローチの一貫性を証明します。問題の特定のクラスに焦点を当てると、私たちの貪欲な手順が収束の(ほぼ)最小最大率を達成する条件を提供し、最悪の場合の設定では手順を改善できないことを意味します。また、決定境界の平滑性パラメータを知らなくても、平滑性パラメータがわかっているのと同じ速度で収束する完全適応型手順を構築します。

Comparing Bayes Model Averaging and Stacking When Model Approximation Error Cannot be Ignored
モデル近似誤差を無視できない場合のベイズモデルの平均化とスタッキングの比較

We compare Bayes Model Averaging, BMA, to a non-Bayes form of modelaveraging called stacking. In stacking, the weights are no longer posterior probabilities of models; they are obtained by a technique based on cross-validation. When the correct data generating model (DGM) is on the list ofmodels under consideration BMA is never worse thanstacking and often is demonstrably better, provided thatthe noise level is of order commensurate with the coefficients andexplanatory variables.Here, however, we focus on the case that the correct DGM is not on the model list and may not be well approximated by the elements on the model list.We give a sequence of computed examples by choosing model lists and DGM’s to contrast the risk performance of stacking and BMA. In the first examples, the model lists are chosen to reflect geometric principles that shouldgive good performance. In these cases, stacking typically outperforms BMA, sometimes by a wide margin. In the second set of examples we examine how stacking and BMA perform when the model listincludes all subsets of a set of potential predictors.When we standardize the size of terms and coefficients in thissetting, we find that BMA outperforms stacking when the deviant terms in the DGM ‘point’ in directions accommodated by the model list but that when the deviant termpoints outside the model list stacking seems to do better.Overall, our results suggest the stacking has better robustnessproperties than BMA in the most important settings.

私たちは、ベイズモデル平均化(BMA)を、スタッキングと呼ばれる非ベイズ形式のモデル平均化と比較します。スタッキングでは、重みはモデルの事後確率ではなくなり、クロス検証に基づく手法によって取得されます。適切なデータ生成モデル(DGM)が検討中のモデルのリストにある場合、ノイズレベルが係数と説明変数に見合ったオーダーであれば、BMAがスタッキングより劣ることはなく、多くの場合、明らかに優れています。ただし、ここでは、適切なDGMがモデルリストになく、モデルリストの要素によって適切に近似されない可能性があるケースに焦点を当てます。モデルリストとDGMを選択して、スタッキングとBMAのリスクパフォーマンスを比較する一連の計算例を示します。最初の例では、モデルリストは、優れたパフォーマンスをもたらす幾何学的原理を反映するように選択されています。これらのケースでは、スタッキングは通常、BMAよりも優れており、場合によっては大幅に優れています。2番目の例では、モデルリストに潜在的な予測子のセットのすべてのサブセットが含まれている場合に、スタッキングとBMAがどのように機能するかを調べます。この設定で項と係数のサイズを標準化すると、DGM内の逸脱項がモデルリストで対応される方向を「指している」場合はBMAがスタッキングよりも優れていますが、逸脱項がモデルリストの外側を指している場合はスタッキングの方が優れていることがわかります。全体として、私たちの結果は、スタッキングが最も重要な設定でBMAよりも優れた堅牢性特性を持っていることを示唆しています。

Speedup Learning for Repair-based Search by Identifying Redundant Steps
冗長なステップの特定による修理ベースの検索の学習の高速化

Repair-based search algorithms start with an initial solution andattempt to improve it by iteratively applying repair operators.Such algorithms can often handle large-scale problems that may bedifficult for systematic search algorithms. Nevertheless, thecomputational cost of solving such problems is still very high.We observed that many of the repair steps applied by suchalgorithms are redundant in the sense that they do not eventuallycontribute to finding a solution. Such redundant steps areparticularly harmful in repair-based search, where each stepcarries high cost due to the very high branching factor typicallyassociated with it.Accurately identifying and avoiding such redundant steps wouldresult in faster local search without harming the algorithm’sproblem-solving ability. In this paper we propose a speeduplearning methodology for attaining this goal. It consists of thefollowing steps: defining the concept of a redundant step;acquiring this concept during off-line learning by analyzingsolution paths for training problems, tagging all the steps alongthe paths according to the redundancy definition and using aninduction algorithm to infer a classifier based on the taggedexamples; and using the acquired classifier to filter out redundantsteps while solving unseen problems.Our algorithm was empirically tested on instances of real-worldemployee timetabling problems (ETP). The problem solver to beimproved is based on one of the best methods for solving somelarge ETP instances. Our results show a significant improvementin speed for test problems that are similar to the given exampleproblems.

修復ベースの検索アルゴリズムは、初期ソリューションから開始し、修復演算子を繰り返し適用してソリューションの改善を試みます。このようなアルゴリズムは、体系的な検索アルゴリズムでは難しい大規模な問題も処理できます。ただし、このような問題を解決するには計算コストが依然として非常に高くなります。このようなアルゴリズムで適用される修復ステップの多くは、最終的にはソリューションの発見に役立たないという意味で冗長であることが分かりました。このような冗長ステップは、通常、非常に高い分岐係数を伴うため、各ステップに高いコストがかかる修復ベースの検索では特に有害です。このような冗長ステップを正確に識別して回避すると、アルゴリズムの問題解決能力を損なうことなく、ローカル検索を高速化できます。この論文では、この目標を達成するための高速学習方法を提案します。このアルゴリズムは、以下のステップから構成されます。冗長ステップの概念を定義します。オフライン学習中にトレーニング問題の解決パスを分析してこの概念を獲得します。冗長定義に従ってパスに沿ったすべてのステップにタグを付け、誘導アルゴリズムを使用してタグ付けされた例に基づいて分類子を推論します。獲得した分類子を使用して、未知の問題を解決する際に冗長ステップを除外します。このアルゴリズムは、実際の従業員のタイムテーブル作成問題(ETP)のインスタンスで実験的にテストされました。改善される問題ソルバーは、いくつかの大規模なETPインスタンスを解決するための最良の方法の1つに基づいています。結果は、指定された例の問題に類似したテスト問題で速度が大幅に向上することを示しています。

Smooth Boosting and Learning with Malicious Noise
悪意のあるノイズによるスムーズなブースティングと学習

We describe a new boosting algorithm which generates only smoothdistributions which do not assign too much weight to any single example. We show that this new boosting algorithm can be used to constructefficient PAC learning algorithms which tolerate relatively high rates ofmalicious noise. In particular, we use the new smooth boosting algorithmto construct malicious noise tolerant versions of the PAC-model p-normlinear threshold learning algorithms described by Servedio (2002).The bounds on sample complexity and malicious noise tolerance of these newPAC algorithms closely correspond to known bounds for the online p-normalgorithms of Grove, Littlestone and Schuurmans (1997)and Gentile and Littlestone (1999).As special cases of our newalgorithms we obtain linear threshold learning algorithms which match thesample complexity and malicious noise tolerance of the online Perceptronand Winnow algorithms. Our analysis reveals an interesting connectionbetween boosting and noise tolerance in the PAC setting.

私たちは、単一の例に過度の重み付けを割り当てない滑らかな分布のみを生成する新しいブースティングアルゴリズムについて説明します。この新しいブースティングアルゴリズムは、比較的高い割合の悪意のあるノイズを許容する効率的なPAC学習アルゴリズムの構築に使用できることを示します。特に、我々は新しい滑らかなブースティングアルゴリズムを使用して、Servedio (2002)によって説明されたPACモデルp正規線形しきい値学習アルゴリズムの悪意のあるノイズ耐性バージョンを構築します。これらの新しいPACアルゴリズムのサンプル複雑性と悪意のあるノイズ耐性の境界は、Grove、Littlestone、Schuurmans (1997)およびGentile、Littlestone (1999)のオンラインp正規アルゴリズムの既知の境界と密接に対応しています。我々の新しいアルゴリズムの特別なケースとして、我々はオンラインパーセプトロンおよびWinnowアルゴリズムのサンプル複雑性と悪意のあるノイズ耐性に一致する線形しきい値学習アルゴリズムを取得します。我々の分析は、PAC設定におけるブースティングとノイズ耐性の興味深い関係を明らかにします。

Inducing Grammars from Sparse Data Sets: A Survey of Algorithms and Results
スパースデータセットからの文法の誘導:アルゴリズムと結果の調査

This paper provides a comprehensive survey of the field of grammar induction applied to randomly generated languages using sparse example sets.

この論文では、スパース例セットを使用してランダムに生成された言語に適用される文法帰納法の分野を包括的に調査します。

The Principled Design of Large-Scale Recursive Neural Network Architectures–DAG-RNNs and the Protein Structure Prediction Problem
大規模再帰的ニューラルネットワークアーキテクチャの原理設計–DAG-RNNとタンパク質構造予測問題

We describe a general methodology for the design of large-scale recursive neural network architectures (DAG-RNNs) which comprises three fundamental steps: (1) representation of a given domain using suitable directed acyclic graphs (DAGs) to connect visible and hidden node variables; (2) parameterization of the relationship between each variable and its parent variables by feedforward neural networks; and (3) application of weight-sharing within appropriate subsets of DAG connections to capture stationarity and control model complexity. Here we use these principles to derive several specific classes of DAG-RNN architectures based on lattices, trees, and other structured graphs. These architectures can process a wide range of data structures with variable sizes and dimensions. While the overall resulting models remain probabilistic, the internal deterministic dynamics allows efficient propagation of information, as well as training by gradient descent, in order to tackle large-scale problems. These methods are used here to derivestate-of-the-art predictors for protein structural features such as secondary structure (1D) and both fine- and coarse-grained contact maps(2D). Extensions, relationships to graphical models, and implications for the design of neural architectures are briefly discussed. The protein prediction servers are available over the Web at: www.igb.uci.edu/tools.htm.

私たちは、大規模再帰ニューラルネットワークアーキテクチャ（DAG-RNN）の設計のための一般的な方法論について説明します。この方法は、（1）適切な有向非巡回グラフ（DAG）を使用して特定のドメインを表現し、可視ノード変数と非表示ノード変数を接続します。（2）フィードフォワードニューラルネットワークによって各変数とその親変数の関係をパラメーター化します。（3）定常性を捉え、モデルの複雑さを制御するために、DAG接続の適切なサブセット内で重み共有を適用します。ここでは、これらの原則を使用して、格子、木、その他の構造化グラフに基づくDAG-RNNアーキテクチャのいくつかの特定のクラスを導出します。これらのアーキテクチャは、さまざまなサイズと次元を持つさまざまなデータ構造を処理できます。結果として得られるモデルは全体的に確率的のままですが、内部の決定論的ダイナミクスにより、大規模な問題に取り組むために、情報の効率的な伝播と勾配降下法によるトレーニングが可能になります。ここでは、これらの方法を使用して、二次構造(1D)や細粒度および粗粒度の接触マップ(2D)などのタンパク質構造特性の最先端の予測子を導き出します。拡張機能、グラフィカルモデルとの関係、およびニューラルアーキテクチャの設計への影響について簡単に説明します。タンパク質予測サーバーは、Webでwww.igb.uci.edu/tools.htmから入手できます。

On Inclusion-Driven Learning of Bayesian Networks
ベイジアンネットワークのインクルージョン駆動型学習について

Two or more Bayesian network structures are Markov equivalent when thecorresponding acyclic digraphs encode the same set of conditionalindependencies. Therefore, the search space of Bayesian network structures maybe organized in equivalence classes, where each of them represents adifferent set of conditional independencies. The collection of sets ofconditional independencies obeys a partial order, the so-called “inclusionorder.”This paper discusses in depth the role that the inclusion order plays inlearning the structure of Bayesian networks. In particular, this role involvesthe way a learning algorithm traverses the search space. We introduce acondition for traversal operators, the inclusion boundary condition,which, when it is satisfied, guarantees that the search strategy can avoidlocal maxima. This is proved under the assumptions that the data is sampledfrom a probability distribution which is faithful to an acyclic digraph,and the length of the sample is unbounded.The previous discussion leads to the design of a new traversal operator andtwo new learning algorithms in the context of heuristic search and the MarkovChain Monte Carlo method. We carry out a set of experiments with synthetic andreal-world data that show empirically the benefit of striving for theinclusion order when learning Bayesian networks from data.

2つ以上のベイジアンネットワーク構造は、対応する非巡回ダイグラフが同じ条件付き独立性のセットをエンコードする場合、マルコフ等価です。したがって、ベイジアンネットワーク構造の検索空間は、それぞれが異なる条件付き独立性のセットを表す同値クラスに編成できます。条件付き独立性のセットのコレクションは、いわゆる「包含順序」と呼ばれる部分的な順序に従います。この論文では、ベイジアンネットワークの構造を学習する際に包含順序が果たす役割について詳しく説明します。特に、この役割には、学習アルゴリズムが検索空間をトラバースする方法が関係します。トラバース演算子の条件である包含境界条件を導入します。これが満たされると、検索戦略が局所的最大値を回避できることが保証されます。これは、データが非巡回有向グラフに忠実な確率分布からサンプリングされ、サンプルの長さが無制限であるという仮定の下で証明されています。これまでの議論は、ヒューリスティック検索とマルコフ連鎖モンテカルロ法のコンテキストで、新しいトラバーサル演算子と2つの新しい学習アルゴリズムの設計につながっています。私たちは、データからベイジアンネットワークを学習するときに包含順序を追求することの利点を経験的に示す、合成データと実際のデータを使用した一連の実験を実行します。

Learning Semantic Lexicons from a Part-of-Speech and Semantically Tagged Corpus Using Inductive Logic Programming
帰納的論理プログラミングを用いた品詞および意味タグ付きコーパスからの意味語彙の学習

This paper describes an inductive logic programming learning method designed toacquire from a corpus specific Noun-Verb (N-V) pairs—relevant in informationretrieval applications to perform index expansion—inorder to build up semantic lexicons based on Pustejovsky’s generative lexicon(GL) principles (Pustejovsky, 1995). In one of the components of this lexical model,called the qualia structure, words are described in terms of semanticroles. For example, the telic role indicates the purpose or function ofan item (cut for knife), the agentive role its creation mode(build for house), etc. The qualia structure of a noun ismainly made up of verbal associations, encoding relational information. Thelearning method enables us toautomatically extract, from a morpho-syntactically and semantically taggedcorpus, N-V pairs whose elements are linked by one of the semantic relationsdefined in the qualia structure in GL. It also infers rules explaining what inthe surrounding context distinguishes such pairs from others alsofound in sentences of the corpus but which are not relevant. Stress is put here on thelearning efficiency that is required to be able to deal with all the availablecontextual information, and to produce linguistically meaningful rules.

この論文では、情報検索アプリケーションでインデックス拡張を実行する際に関連する特定の名詞-動詞(N-V)ペアをコーパスから取得し、Pustejovskyの生成語彙(GL)原理(Pustejovsky、1995)に基づく意味語彙を構築するように設計された帰納的論理プログラミング学習法について説明します。この語彙モデルのコンポーネントの1つであるクオリア構造では、単語は意味役割で記述されます。たとえば、目的役割はアイテムの目的または機能(ナイフの場合はcut)を示し、動作役割は作成モード(家の場合はbuild)などを示します。名詞のクオリア構造は主に動詞の関連で構成され、関係情報をエンコードします。この学習法によって、形態統語論的および意味論的にタグ付けされたコーパスから、GLのクオリア構造で定義された意味関係の1つによって要素がリンクされているN-Vペアを自動的に抽出できます。また、周囲のコンテキストでこのようなペアがコーパスの文に見られるが関連しない他のペアと区別される理由を説明するルールも推測します。ここでは、利用可能なすべてのコンテキスト情報を処理し、言語的に意味のあるルールを生成するために必要な学習効率に重点が置かれています。

Query Transformations for Improving the Efficiency of ILP Systems
ILP システムの効率を向上させるためのクエリ変換

Relatively simple transformations can speed up the execution ofqueries for data mining considerably. While some ILP systems use suchtransformations, relatively little is known about them orhow they relate to each other. This paper describes a number of such transformations.Not all of them are novel, but there have been no studies comparing theirefficacy. The main contributions of thepaper are: (a) it clarifies the relationship between the transformations;(b) it contains an empirical study of what can be gained by applying the transformations;and (c) it provides some guidance on the kinds of problems that are likely tobenefit from the transformations.

比較的単純な変換で、データマイニングのクエリの実行を大幅に高速化できます。一部のILPシステムはそのような変換を使用していますが、それらやそれらが互いにどのように関連しているかについては比較的ほとんど知られていません。この論文では、そのような変換のいくつかについて説明します。それらのすべてが新規であるわけではありませんが、それらの有効性を比較した研究はありませんでした。この論文の主な貢献は次のとおりです:(a)変換間の関係を明確にします。(b)変換を適用することによって何を得ることができるかについての実証的研究が含まれています。(c)変換から恩恵を受ける可能性が高い問題の種類について、いくつかのガイダンスを提供します。

Relational Learning as Search in a Critical Region
重要な領域における探索としてのリレーショナル学習

Machine learning strongly relies on the covering test to assess whethera candidate hypothesis covers training examples. The present paper investigates learning relational concepts from examples, termed relational learning or inductive logicprogramming. In particular, it investigates the chances of successand the computational cost of relational learning, which appears to be severelyaffected by the presence of a phase transition in the covering test.To this aim, three up-to-date relational learners have been applied to a wide range of artificial, fully relationallearning problems.A first experimental observation is that the phase transition behaves as an attractor for relational learning; no matter which region the learning problem belongs to,all three learners produce hypotheses lying within or close to the phasetransition region.Second, a failure region appears. All three learners fail to learn any accurate hypothesis in this region. Quite surprisingly, the probability of failure does not systematically increase with the size of the underlying target concept: under some circumstances, longer concepts may be easier to accurately approximate than shorter ones. Some interpretations for these findings are proposed and discussed.

機械学習は、候補仮説がトレーニング例をカバーしているかどうかを評価するために、カバーテストに大きく依存しています。この論文では、例から関係概念を学習すること、つまり関係学習または帰納的論理プログラミングについて調査します。特に、カバーテストにおける相転移の存在によって大きく影響を受けると思われる関係学習の成功の可能性と計算コストを調査します。この目的のために、最新の3つの関係学習器が、人工の完全な関係学習問題に幅広く適用されています。最初の実験的観察は、相転移が関係学習のアトラクターとして動作することです。学習問題がどの領域に属しているかに関係なく、3つの学習器はすべて相転移領域内またはその近くにある仮説を生成します。2番目に、失敗領域が現れます。3つの学習器はすべて、この領域で正確な仮説を学習できません。驚くべきことに、失敗の確率は、基礎となるターゲット概念のサイズとともに系統的に増加するわけではありません。状況によっては、長い概念の方が短い概念よりも正確に近似しやすい場合があります。これらの調査結果に対するいくつかの解釈が提案され、議論されています。

ILP: A Short Look Back and a Longer Look Forward
ILP:短い振り返りと長い未来

Inductive logic programming (ILP) is built on a foundation laid byresearch in machine learning and computational logic.Armed with this strong foundation, ILP has been appliedto important and interesting problems in the life sciences, engineering and the arts.This paper begins by briefly reviewing some example applications, in orderto illustrate the benefits of ILP.In turn, the applications have broughtinto focus the need for more research into specific topics.We enumerate and elaborate five of these: (1) novel search methods;(2) incorporation of explicit probabilities; (3) incorporationof special-purpose reasoners; (4) parallel execution usingcommodity components; and (5) enhanced human interaction.It is our hypothesis that progress in each of these areascan greatly improve the contributions that can be made with ILP;and that, with assistance from research workers in other areas,significant progress in each of these areas is possible.

帰納論理プログラミング(ILP)は、機械学習と計算論理の研究によって築かれた基盤の上に構築されています。この強固な基盤を武器に、ILPは生命科学、工学、芸術における重要で興味深い問題に適用されてきました。この論文では、いくつかの例のアプリケーションを簡単にレビューすることから始めますが、ILP.Inターンの利点を説明するために、アプリケーションは特定のトピックに関するより多くの研究の必要性に焦点を当てています。これらのうちの5つを列挙し、詳しく説明します:(1)新しい検索方法。(2)明示的な確率の組み込み。(3)特殊目的推論者の組み込み。(4)商品コンポーネントを使用した並列実行。(5)人間の相互作用の強化。これらの各分野での進歩により、ILPによる貢献が大幅に改善されるというのが私たちの仮説です。そして、他の分野の研究者の支援があれば、これらの各分野で大きな進歩を遂げることができます。

Introduction to the Special Issue on Inductive Logic Programming
帰納論理プログラミング特集のイントロダクション

Learning Behavior-Selection by Emotions and Cognition in a Multi-Goal Robot Task
多目標ロボット課題における感情と認知による学習行動選択

The existence of emotion and cognition as two interacting systems, both with important roles in decision-making, has been recently advocated by neurophysiological research (LeDoux, 1998, Damasio, 1994. Following that idea, this paper presents the ALEC agent architecture which has both emotive and cognitive learning, as well as emotive and cognitive decision-making capabilities to adapt to real-world environments. These two learning mechanisms embody very different properties which can be related to those of natural emotion and cognition systems. The reported experiments test ALEC within a simulated autonomous robot which learns to perform a multi-goal and multi-step survival task when faced with real world conditions, namely continuous time and space, noisy sensors and unreliable actuators. Experimental results show that both systems contribute positively to the learning performance of the agent.

感情と認知が相互作用する2つのシステムとして存在し、どちらも意思決定において重要な役割を果たすことは、最近、神経生理学的研究によって提唱されている(LeDoux, 1998, Damasio, 1994.この考えに従って、この論文では、感情学習と認知学習の両方を備えたALECエージェントアーキテクチャと、実世界の環境に適応するための感情的および認知的意思決定能力を紹介します。これら2つの学習メカニズムは、自然の感情および認知システムのそれらに関連する可能性のある非常に異なる特性を具体化しています。報告された実験では、ALECをシミュレートした自律型ロボットでテストし、現実世界の条件、つまり連続的な時間と空間、ノイズの多いセンサー、信頼性の低いアクチュエーターに直面したときに、マルチゴールとマルチステップのサバイバルタスクを実行することを学習します。実験結果は、両方のシステムがエージェントの学習パフォーマンスにプラスの貢献をしていることを示しています。

An Empirical Study of the Use of Relevance Information in Inductive Logic Programming
帰納的論理プログラミングにおける関連性情報の利用に関する実証的研究

Inductive Logic Programming (ILP) systems construct modelsfor data using domain-specific background information.When using these systems, it is typically assumed thatsufficient human expertise is at hand to rule outirrelevant background information. Such irrelevant information can, andtypically does, hinder an ILP system’s search for good models.Here, we provide evidence that if expertiseis available that can provide a partial-orderingon sets of background predicates in terms ofrelevance to the analysis task, then this can be used to good effect byan ILP system. In particular, using data from biochemical domains,we investigate an incremental strategy of including sets of predicatesin decreasing order of relevance. Results obtained suggest that:(a) the incremental approach identifies, in substantially less time,a model that is comparable in predictive accuracy to thatobtained with all background information in place; and(b) the incremental approach using the relevance ordering performsbetter than one that does not (that is, one that adds setsof predicates randomly).For a practitioner concerned with use of ILP,the implication of these findings are two-fold:(1) when not all background information can be usedat once (either due to limitations of the ILP system, orthe nature of the domain) expert assessment of the relevanceof background predicates can assist substantiallyin the construction of good models; and(2) good “first-cut” results can be obtained quickly by a simple exclusion ofinformation known to be less relevant.

帰納的論理プログラミング(ILP)システムは、ドメイン固有の背景情報を使用してデータのモデルを構築します。これらのシステムを使用する場合、通常は、無関係な背景情報を除外するのに十分な人間の専門知識が手元にあると想定されます。このような無関係な情報は、ILPシステムが適切なモデルを検索するのを妨げる可能性があり、通常は妨げになります。ここでは、分析タスクとの関連性の観点から背景述語セットの部分的な順序付けを提供できる専門知識があれば、これをILPシステムで効果的に使用できるという証拠を示します。特に、生化学ドメインのデータを使用して、述語セットを関連性の降順に含める増分戦略を調査します。得られた結果から、(a)増分アプローチにより、大幅に短い時間で、すべての背景情報が整っている場合に取得されるモデルと予測精度が同等のモデルが特定されることが示唆されます。(b)関連性の順序付けを使用する増分アプローチは、関連性の順序付けを使用しないアプローチ(つまり、述語のセットをランダムに追加するアプローチ)よりもパフォーマンスが優れています。ILPの使用に関心のある実務家にとって、これらの発見は2つの意味を持ちます。(1)すべての背景情報を一度に使用できない場合(ILPシステムの制限またはドメインの性質により)、背景述語の関連性の専門家による評価は、優れたモデルの構築に大きく役立ちます。(2)関連性が低いことがわかっている情報を単純に除外するだけで、優れた「最初の」結果をすぐに得ることができます。

Fusion of Domain Knowledge with Data for Structural Learning in Object Oriented Domains
オブジェクト指向領域における構造学習のためのドメイン知識とデータの融合

When constructing a Bayesian network, it can be advantageous to employ structural learning algorithms to combine knowledge captured in databases with prior information provided by domain experts. Unfortunately, conventional learning algorithms do not easily incorporate prior information, if this information is too vague to be encoded as properties that are local to families of variables. For instance, conventional algorithms do not exploit prior information about repetitive structures, which are often found in object oriented domains such as computer networks, large pedigrees and genetic analysis. In this paper we propose a method for doing structural learning in object oriented domains. It is demonstrated that this method is more efficient than conventional algorithms in such domains, and it is argued that the method supports a natural approach for expressing and incorporating prior information provided by domain experts.

ベイジアンネットワークを構築する際には、構造学習アルゴリズムを使用して、データベースに取得された知識とドメインの専門家から提供された事前情報を組み合わせると有利な場合があります。残念ながら、従来の学習アルゴリズムでは、事前情報が曖昧すぎて変数のファミリーにローカルなプロパティとしてエンコードできない場合、事前情報を簡単に組み込むことができません。たとえば、従来のアルゴリズムは、コンピューターネットワーク、大規模な血統、遺伝子分析などのオブジェクト指向の領域でよく見られる反復構造に関する事前情報を利用しません。この論文では、オブジェクト指向領域における構造学習の方法を提案します。この手法は、このようなドメインにおいて従来のアルゴリズムよりも効率的であることが実証されており、この手法は、ドメインの専門家によって提供される事前情報を表現し、組み込むための自然なアプローチをサポートしていると主張されています。

Preference Elicitation via Theory Refinement
理論の洗練による選好誘発

We present an approach to elicitation of user preference models in which assumptions can be usedto guide but not constrain the elicitation process. We demonstrate that when domain knowledge isavailable, even in the form of weak and somewhat inaccurate assumptions, significantly less datais required to build an accurate model of user preferences than when no domain knowledge isprovided. This approach is based on the KBANN (Knowledge-Based Artificial Neural Network)algorithm pioneered by Shavlik and Towell (1989). We demonstrate thisapproach through two examples, one involves preferences under certainty, and the other involvespreferences under uncertainty. In the case of certainty, we show how to encode assumptionsconcerning preferential independence and monotonicity in a KBANN network, which can be trainedusing a variety of preferential information including simple binary classification. In the case ofuncertainty, we show how to construct a KBANN network that encodes certain types of dominancerelations and attitude toward risk. The resulting network can be trained using answers to standardgamble questions and can be used as an approximate representation of a person’s preferences. Weempirically evaluate our claims by comparing the KBANN networks with simple backpropagationartificial neural networks in terms of learning rate and accuracy. For the case of uncertainty,the answers to standard gamble questions used in the experiment are taken from an actual medicaldata set first used by Miyamoto and Eraker (1988). In the case ofcertainty, we define a measure to which a set of preferences violate a domain theory, and examinethe robustness of the KBANN network as this measure of domain theory violation varies.

私たちは、仮定を使用して抽出プロセスをガイドすることはできるが、制約はしない、ユーザー嗜好モデル抽出のアプローチを提示します。ドメイン知識が利用できる場合、たとえ弱く、多少不正確な仮定であっても、ドメイン知識が提供されていない場合よりも、ユーザー嗜好の正確なモデルを構築するために必要なデータが大幅に少なくなることを示します。このアプローチは、ShavlikとTowell (1989)が開発したKBANN (知識ベース人工ニューラルネットワーク)アルゴリズムに基づいています。私たちは、2つの例を通してこのアプローチを示します。1つは確実性の下での嗜好に関する例、もう1つは不確実性の下での嗜好に関する例です。確実性の場合、単純なバイナリ分類を含むさまざまな嗜好情報を使用してトレーニングできるKBANNネットワークで、嗜好の独立性と単調性に関する仮定をエンコードする方法を示します。不確実性の場合、特定の種類の優位関係とリスクに対する態度をエンコードするKBANNネットワークを構築する方法を示します。結果として得られるネットワークは、標準的なギャンブルの質問に対する回答を使用してトレーニングでき、個人の好みのおおよその表現として使用できます。学習率と精度の観点から、KBANNネットワークを単純なバックプロパゲーション人工ニューラルネットワークと比較することにより、私たちの主張を実証的に評価します。不確実性の場合、実験で使用される標準的なギャンブルの質問に対する回答は、MiyamotoとEraker (1988)によって初めて使用された実際の医療データセットから取得されます。確実性の場合、一連の好みがドメイン理論に違反する尺度を定義し、このドメイン理論違反の尺度が変化するときのKBANNネットワークの堅牢性を調べます。

Combining Knowledge from Different Sources in Causal Probabilistic Models
因果確率モデルにおける異なる情報源からの知識の組み合わせ

Building probabilistic and decision-theoretic models requires aconsiderable knowledge engineering effort in which the mostdaunting task is obtaining the numerical parameters. Authors ofBayesian networks usually combine various sources of information,such as textbooks, statistical reports, databases, and expertjudgement. In this paper, we demonstrate the risks of such acombination, even when this knowledge encompasses such seeminglypopulation-independent characteristics as sensitivity andspecificity of medical symptoms. We show that the criteria “donot combine knowledge from different sources” or “use only datafrom the setting in which the model will be used” are neithernecessary nor sufficient to guarantee the correctness of themodel. Instead, we offer graphical criteria for determining whenknowledge from different sources can be safely combined into thegeneral population model. We also offer a method for buildingsubpopulation models. The analysis performed in this paper and thecriteria we propose may be useful in such fields as knowledgeengineering, epidemiology, machine learning, and statisticalmeta-analysis.

確率モデルと意思決定理論モデルの構築には、かなりの知識工学的努力が必要ですが、その中で最も困難な作業は数値パラメータの取得です。ベイジアンネットワークの作成者は通常、教科書、統計レポート、データベース、専門家の判断など、さまざまな情報源を組み合わせます。この論文では、この知識が医学的症状の感度と特異性など、一見人口に依存しない特性を含む場合でも、このような組み合わせのリスクを示します。「異なる情報源からの知識を組み合わせない」または「モデルが使用される設定のデータのみを使用する」という基準は、モデルの正確性を保証するために必要でも十分でもないことを示します。代わりに、異なる情報源からの知識を一般人口モデルに安全に組み合わせることができるかどうかを判断するためのグラフィカルな基準を提供します。また、サブ人口モデルを構築する方法も提供します。この論文で実行された分析と提案する基準は、知識工学、疫学、機械学習、統計メタ分析などの分野で役立つ可能性があります。

Introduction to the Special Issue on the Fusion of Domain Knowledge with Data for Decision Support
特集「意思決定支援のためのドメイン知識とデータの融合」の紹介

Learning Probabilistic Models: An Expected Utility Maximization Approach
確率モデルの学習:期待効用最大化アプローチ

We consider the problem of learning a probabilistic model from the viewpoint of an expected utility maximizing decision maker/investor who would use the model to make decisions (bets), which result in well defined payoffs.In our new approach, we seek good out-of-sample model performance by considering a one-parameter family of Pareto optimal models, which we define in terms of consistency with the training data and consistency with a prior (benchmark) model. We measure the former by means of the large-sample distribution of a vector of sample-averaged features, and the latter by means of a generalized relative entropy.We express each Pareto optimal model as the solution of a strictly convex optimization problem and its strictly concave (and tractable) dual. Each dual problem is a regularized maximization of expected utility over a well-defined family of functions.Each Pareto optimal model is robust: maximizing worst-case outperformance relative to the benchmark model.Finally, we select the Pareto optimal model with maximum (out-of-sample) expected utility.We show that our method reduces to the minimum relative entropy method if and only if the utility function is a member of a three-parameter logarithmic family.

私たちは、期待効用を最大化する意思決定者/投資家の観点から確率モデルを学習する問題について考えます。意思決定者/投資家は、モデルを使用して意思決定(賭け)を行い、その結果、明確に定義されたペイオフが得られます。新しいアプローチでは、トレーニングデータとの一貫性と以前の(ベンチマーク)モデルとの一貫性の観点から定義される、1パラメータのパレート最適モデルファミリを検討することにより、サンプル外モデルのパフォーマンスを向上します。前者はサンプル平均特徴のベクトルの大サンプル分布によって測定し、後者は一般化相対エントロピーによって測定します。各パレート最適モデルは、厳密に凸な最適化問題とその厳密に凹な(扱いやすい)双対の解として表現します。各双対問題は、明確に定義された関数群に対する期待効用の正規化された最大化です。各パレート最適モデルは堅牢で、ベンチマークモデルと比較して最悪の場合のパフォーマンスを最大化します。最後に、最大(サンプル外)期待効用を持つパレート最適モデルを選択します。効用関数が3パラメータ対数族のメンバーである場合に限り、この方法が最小相対エントロピー法に簡略化されることを示します。

Tree Induction vs. Logistic Regression: A Learning-Curve Analysis
木帰納法 vs. ロジスティック回帰: 学習曲線分析

Tree induction and logistic regression are two standard,off-the-shelf methods for building models for classification. Wepresent a large-scale experimental comparison of logistic regressionand tree induction, assessing classification accuracy and the qualityof rankings based on class-membership probabilities. We use alearning-curve analysis to examine the relationship of these measuresto the size of the training set. The results of the study showseveral things. (1) Contrary to some prior observations,logistic regression does not generally outperform tree induction. (2)More specifically, and not surprisingly, logistic regression is betterfor smaller training sets and tree induction for larger data sets.Importantly, this often holds for training sets drawn from the samedomain (that is, the learning curves cross), so conclusions aboutinduction-algorithm superiority on a given domain must be based on ananalysis of the learning curves. (3) Contrary to conventional wisdom,tree induction is effective at producing probability-based rankings,although apparently comparatively less so for a given training-setsize than at making classifications. Finally, (4) the domains onwhich tree induction and logistic regression are ultimately preferablecan be characterized surprisingly well by a simple measure ofthe separability of signal from noise.

ツリー誘導とロジスティック回帰は、分類モデルを構築するための2つの標準的な既成の方法です。ロジスティック回帰とツリー誘導の大規模な実験的比較を提示し、クラス所属確率に基づく分類精度とランキングの品質を評価します。学習曲線分析を使用して、これらの尺度とトレーニングセットのサイズの関係を調べます。研究の結果、いくつかのことが明らかになりました。(1)以前の観察とは反対に、ロジスティック回帰は一般にツリー誘導よりも優れているわけではありません。(2)さらに具体的には、予想どおり、ロジスティック回帰は小規模なトレーニングセットに適しており、ツリー誘導は大規模なデータセットに適しています。重要なことは、これは多くの場合、同じドメインから抽出されたトレーニングセットに当てはまる(つまり、学習曲線が交差する)ため、特定のドメインでの誘導アルゴリズムの優位性に関する結論は、学習曲線の分析に基づく必要があることです。（３）従来の認識に反して、ツリー誘導は確率に基づくランキングを生成するのに効果的であるが、与えられた訓練セットのサイズに対しては分類を行うのに比べると明らかに比較的効果が低い。最後に、（４）ツリー誘導とロジスティック回帰が最終的に好ましい領域は、信号とノイズの分離可能性の単純な尺度によって驚くほどよく特徴付けることができます。

Bottom-Up Relational Learning of Pattern Matching Rules for Information Extraction
情報抽出のためのパターンマッチングルールのボトムアップ関係学習

Information extraction is a form of shallow text processing that locates aspecified set of relevant items in a natural-language document. Systems forthis task require significant domain-specific knowledge and are time-consumingand difficult to build by hand, making them a good application for machinelearning. We present an algorithm, RAPIER, that uses pairs ofsample documents and filled templates to induce pattern-match rules thatdirectly extract fillers for the slots in the template. RAPIER is abottom-up learning algorithm that incorporates techniques from severalinductive logic programming systems. We have implemented the algorithm in a system that allows patterns to haveconstraints on the words, part-of-speech tags, and semantic classespresent in the filler and the surrounding text. We present encouragingexperimental results on two domains.

情報抽出は、自然言語ドキュメント内の特定の関連アイテムのセットを見つける浅いテキスト処理の形式です。このタスクのためのシステムは、ドメイン固有の知識を大量に必要とし、時間がかかり、手作業で構築するのが難しいため、機械学習の優れたアプリケーションになります。サンプルドキュメントと充填されたテンプレートのペアを使用して、テンプレート内のスロットのフィラーを直接抽出するパターンマッチルールを誘導するアルゴリズムRAPIERを紹介します。RAPIERは、いくつかの帰納的論理プログラミングシステムの手法を組み込んだボトムアップ学習アルゴリズムです。このアルゴリズムは、パターンがフィラーと周囲のテキストに存在する単語、品詞タグ、およびセマンティッククラスに制約を持たせることを可能にするシステムに実装しました。私たちは、2つのドメインで有望な実験結果を提示します。

On the Proper Learning of Axis-Parallel Concepts
軸平行概念の適切な学習について

We study the proper learnability of axis-parallel concept classesin the PAC-learning and exact-learning models.These classes include union of boxes, DNF, decision trees and multivariate polynomials.For constant-dimensional axis-parallel concepts C we show that the following problems have time complexities that are within a polynomial factor of each other.C is α-properly exactly learnable (with hypotheses of sizeat most α times the target size)from membership and equivalence queries.C is α-properly PAC learnable (without membership queries)under any product distribution.There is an α-approximation algorithm for the MINEQUIC problem (given a g ∈ Cfind a minimal size f ∈ C that is logically equivalent tog).In particular, if one has polynomial time complexity, they all do.Using this we give the first proper-learning algorithmof constant-dimensional decision trees andthe first negative results in proper learning from membershipand equivalence queries for many classes.For axis-parallel concepts over a nonconstant dimension we show that with the equivalence oracle (1) ⇒ (3). We use this to show that (binary) decision trees are not properly learnable in polynomial time (assuming P ≠ NP) and DNF is not sε-properly learnable (ε < 1) in polynomial time even with an NP-oracle (assuming Σ2P ≠ PNP).

私たちは、PAC学習モデルと完全学習モデルにおける軸平行概念クラスの適切な学習可能性について調査します。これらのクラスには、ボックスの結合、DNF、決定木、多変量多項式が含まれます。定数次元の軸平行概念Cについて、次の問題の時間複雑度が互いに多項式係数以内であることを示します。Cは、メンバーシップおよび同値クエリから α 適切に正確に学習可能です(ターゲットサイズの最大 α 倍のサイズの仮説を使用)。Cは、任意の積分布の下で α 適切にPAC学習可能です(メンバーシップクエリなし)。MINEQUIC問題(g∈Cが与えられた場合、gと論理的に同等な最小サイズf∈Cを見つけます)には、α 近似アルゴリズムがあります。特に、1つが多項式時間複雑度を持つ場合、それらはすべて多項式時間複雑度を持ちます。これを使用して、定数次元決定木の最初の適切な学習アルゴリズムと、多くのメンバーシップおよび同値クエリからの適切な学習における最初の否定的な結果を示します。クラス。非定数次元上の軸平行概念については、同値オラクル(1)⇒(3)でそれを示します。これを使用して、(バイナリ)決定木は多項式時間で適切に学習できない(P≠NPと仮定)こと、およびNPオラクル(Σ2P≠PNPと仮定)を使用してもDNFは多項式時間でsε適切に学習できない(ε< 1)ことを示します。

Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifolds
Think Globally, Fit Locally: 低次元多様体の教師なし学習

The problem of dimensionality reduction arises in many fields ofinformation processing, including machine learning, data compression,scientific visualization, pattern recognition, and neural computation.Here we describe locally linear embedding (LLE), an unsupervisedlearning algorithm that computes low dimensional, neighborhoodpreserving embeddings of high dimensional data. The data, assumed tobe sampled from an underlying manifold, are mapped into a single global coordinatesystem of lower dimensionality. The mapping is derived from thesymmetries of locally linear reconstructions, and the actualcomputation of the embedding reduces to a sparse eigenvalue problem.Notably, the optimizations in LLE—though capable of generatinghighly nonlinear embeddings—are simple to implement, and they do notinvolve local minima. In this paper, we describe the implementation of the algorithmin detail and discuss several extensions that enhance its performance.We present results of the algorithm applied to data sampled fromknown manifolds, as well as to collections of images offaces, lips, and handwritten digits. These examplesare used to provide extensive illustrations ofthe algorithm’s performance—both successes and failures—and to relate the algorithm to previous and ongoing work in nonlinear dimensionality reduction.

次元削減の問題は、機械学習、データ圧縮、科学的視覚化、パターン認識、ニューラル計算など、情報処理の多くの分野で発生します。ここでは、高次元データの低次元で近傍保存の埋め込みを計算する教師なし学習アルゴリズムである、局所線形埋め込み(LLE)について説明します。基礎となる多様体からサンプリングされたと想定されるデータは、より低次元の単一のグローバル座標系にマッピングされます。マッピングは、局所線形再構成の対称性から導出され、埋め込みの実際の計算は、スパース固有値問題に簡略化されます。特に、LLEの最適化は、非常に非線形な埋め込みを生成できますが、実装が簡単で、局所最小値を含みません。この論文では、アルゴリズムの実装について詳細に説明し、そのパフォーマンスを向上させるいくつかの拡張機能について説明します。既知の多様体からサンプリングされたデータ、および顔、唇、手書きの数字の画像のコレクションにアルゴリズムを適用した結果を示します。これらの例は、アルゴリズムのパフォーマンス(成功と失敗の両方)を広範囲に説明し、アルゴリズムを非線形次元削減の過去および進行中の研究に関連付けるために使用されます。

Optimally-Smooth Adaptive Boosting and Application to Agnostic Learning
最適に滑らかな適応ブースティングと不可知論的学習への応用

We describe a new boosting algorithm that is the first such algorithm to be both smooth and adaptive. These two features make possible performance improvements for many learning tasks whose solutions use a boosting technique. The boosting approach was originally suggested for the standard PAC model; we analyze possible applications of boosting in the context of agnostic learning, which is more realistic than the PAC model. We derive a lower bound for the final error achievable by boosting in the agnostic model and show that our algorithm actually achieves that accuracy (within a constant factor). We note that the idea of applying boosting in the agnostic model was first suggested by Ben-David, Long and Mansour (2001) and the solution they give is improved in the present paper. The accuracy we achieve is exponentially better with respect to the standard agnostic accuracy parameter β. We also describe the construction of a boosting “tandem” whose asymptotic number of iterations is the lowest possible (in both γ and ε and whose smoothness is optimal in terms of Õ(·). This allows adaptively solving problems whose solution is based on smooth boosting (like noise tolerant boosting and DNF membership learning), while preserving the original (non-adaptive) solution’s complexity.

私たちは、スムーズかつ適応型である初めての新しいブースティングアルゴリズムについて説明します。この2つの機能により、ブースティング手法を使用するソリューションを持つ多くの学習タスクのパフォーマンス向上が可能になります。ブースティングアプローチは、もともと標準PACモデル用に提案されました。私たちは、PACモデルよりも現実的な、不可知論的学習のコンテキストでのブースティングの適用可能性を分析します。不可知論的モデルでブースティングによって達成可能な最終誤差の下限を導出し、我々のアルゴリズムが実際にその精度(定数倍以内)を達成することを示します。不可知論的モデルでブースティングを適用するというアイデアは、Ben-David、Long、Mansour (2001)によって最初に提案され、彼らが示すソリューションが本論文で改善されていることに注意してください。我々が達成する精度は、標準の不可知論的精度パラメーター β に対して指数関数的に優れています。また、漸近反復回数が可能な限り少なく（γ と ε の両方で）、Õ(·)に関して滑らかさが最適であるブースティング「タンデム」の構築についても説明します。これにより、元の（非適応型）ソリューションの複雑さを維持しながら、スムーズブースティング（ノイズ耐性ブースティングやDNFメンバーシップ学習など）に基づくソリューションを持つ問題を適応的に解決できます。

Task Clustering and Gating for Bayesian Multitask Learning
ベイジアンマルチタスク学習のためのタスククラスタリングとゲーティング

Modeling a collection of similar regression or classification tasks can be improved by making the tasks ‘learn from each other’. In machine learning, this subject is approached through ‘multitask learning’, where parallel tasksare modeled as multiple outputs of the same network. In multilevelanalysis this is generally implemented through the mixed-effectslinear model where a distinction is made between ‘fixed effects’, whichare the same for all tasks, and ‘random effects’, which may varybetween tasks. In the present article we will adopta Bayesian approach in which some of the model parameters are shared (the same for all tasks) and others more loosely connected througha joint prior distribution that can be learned from the data. We seekin this way to combine the best parts of both the statisticalmultilevel approach and the neural network machinery.The standard assumption expressed in both approaches is that each task can learn equally well from any other task. In this article we extend the model by allowing more differentiation in the similarities between tasks.One such extension is to make the priormean depend on higher-level task characteristics. More unsupervisedclustering of tasks is obtained if we go from a single Gaussian priorto a mixture of Gaussians. This can be further generalized to amixture of experts architecture with the gates depending on task characteristics.All three extensions are demonstrated through application both on anartificial data set and on two real-world problems, one a schoolproblem and the other involving single-copy newspaper sales.

類似した回帰または分類タスクのコレクションをモデル化することは、タスクを「お互いから学習」させることによって改善できます。機械学習では、この主題は「マルチタスク学習」を通じてアプローチされ、並列タスクが同じネットワークの複数の出力としてモデル化されます。マルチレベル分析では、これは通常、すべてのタスクで同じ「固定効果」とタスク間で異なる可能性がある「ランダム効果」を区別する混合効果線形モデルを通じて実装されます。この記事では、一部のモデルパラメータが共有され(すべてのタスクで同じ)、他のパラメータはデータから学習できる結合事前分布を通じてより緩く接続されるベイズアプローチを採用します。この方法で、統計的マルチレベルアプローチとニューラルネットワーク機構の両方の最良の部分を組み合わせることを目指します。両方のアプローチで表現される標準的な仮定は、各タスクが他のどのタスクからも同様に学習できるということです。この記事では、タスク間の類似性をさらに差別化できるようにモデルを拡張します。そのような拡張の1つは、事前平均を高レベルのタスク特性に依存させることです。単一のガウス分布からガウス分布の混合に移行すると、より教師なしのタスククラスタリングが可能になります。これは、タスクの特性に応じてゲートを使用するエキスパート混合アーキテクチャにさらに一般化できます。3つの拡張機能はすべて、人工データセットと、学校の問題と新聞の単一コピー販売に関する2つの実際の問題の両方に適用することで実証されています。

The em Algorithm for Kernel Matrix Completion with Auxiliary Data
補助データによるカーネル行列補完のための em アルゴリズム

In biological data, it is often the case that observed data areavailable only for a subset of samples. When a kernel matrix isderived from such data, we have to leave the entries for unavailablesamples as missing. In this paper, the missing entries are completedby exploiting an auxiliary kernel matrix derived from anotherinformation source. The parametric model of kernel matrices iscreated as a set of spectral variants of the auxiliary kernel matrix,and the missing entries are estimated by fitting this model to theexisting entries. For model fitting, we adopt the em algorithm(distinguished from the EM algorithm of Dempster et al., 1977) basedon the information geometry of positive definite matrices. We willreport promising results on bacteria clustering experiments using twomarker sequences: 16S and gyrB.

生物学的データでは、観察されたデータがサンプルのサブセットについてのみ利用可能であることがよくあります。カーネル行列がそのようなデータから導出される場合、unavailablesamplesのエントリを欠落として残す必要があります。この論文では、不足しているエントリは、別の情報ソースから派生した補助カーネルマトリックスを利用することによって完成されます。カーネル行列のパラメトリックモデルは、補助カーネル行列のスペクトルバリアントのセットとして作成され、欠落しているエントリは、このモデルを既存のエントリに適合させることによって推定されます。モデルフィッティングには、正定値行列の情報幾何学に基づくemアルゴリズム(Dempsterら, 1977のEMアルゴリズムとは区別される)を採用します。16SとgyrBの2つのマーカー配列を用いた細菌クラスタリング実験について、有望な結果を報告します。

Designing Committees of Models through Deliberate Weighting of Data Points
データポイントの意図的な重み付けによるモデルの委員会の設計

In the adaptive derivation of mathematical models from data, each data point should contribute with a weight reflecting the amount of confidence one has in it. When no additional information for data confidence is available, all the data points should be considered equal, and are also generally given the same weight. In the formation of committees of models, however, this is often not the case and the data points may exercise unequal, even random, influence over the committee formation.In this paper, a principled approach to committee design is presented. The construction of a committee design matrix is detailed through which each data point will contribute to the committee formation with a fixed weight, while contributing with different individual weights to the derivation of the different constituent models, thus encouraging model diversity whilst not biasing the committee inadvertently towards any particular data points. Not distinctly an algorithm, it is instead a framework within which several different committee approaches may be realised.Whereas the focus in the paper lies entirely on regression, the principles discussed extend readily to classification.

データから数学モデルを適応的に導出する場合、各データポイントは、そのデータポイントに対する信頼度を反映する重みで寄与する必要があります。データの信頼度に関する追加情報がない場合、すべてのデータポイントは同等とみなされ、通常、同じ重みが与えられます。ただし、モデル委員会の形成では、そうならないことが多く、データポイントは委員会の形成に対して不平等な、さらにはランダムな影響を及ぼす可能性があります。この論文では、委員会の設計に対する原則的なアプローチが提示されています。委員会設計マトリックスの構築が詳細に説明されており、このマトリックスでは、各データポイントが固定の重みで委員会の形成に寄与する一方で、異なる構成モデルの導出には異なる個別の重みで寄与します。これにより、モデルの多様性が促進され、委員会が特定のデータポイントに不注意に偏ることがなくなります。これは、明確にアルゴリズムというわけではなく、複数の異なる委員会アプローチを実現できるフレームワークです。この論文では回帰に焦点が当てられていますが、議論されている原則は分類にも容易に適用できます。

FINkNN: A Fuzzy Interval Number k-Nearest Neighbor Classifier for Prediction of Sugar Production from Populations of Samples
FINkNN:サンプルの母集団からの砂糖生産の予測のためのファジィ区間数k最近傍分類器

This work introduces FINkNN, a k-nearest-neighbor classifier operating over the metric lattice ofconventional interval-supported convex fuzzy sets. We show that for problems involvingpopulations of measurements, data can be represented by fuzzy interval numbers (FINs) and wepresent an algorithm for constructing FINs from such populations. We then present a lattice-theoreticmetric distance between FINs with arbitrary-shapedmembership functions, which formsthe basis for FINkNN’s similarity measurements. We apply FINkNN to the task of predictingannual sugar production based on populations of measurements supplied by Hellenic SugarIndustry. We show that FINkNN improves prediction accuracy on this task, and discuss thebroader scope and potential utility of these techniques.

この研究では、従来の間隔でサポートされた凸ファジィ集合のメトリック格子上で動作するk-nearest-neighbor分類器であるFINkNNを紹介します。測定値の母集団を含む問題の場合、データはファジー間隔数(FIN)で表すことができ、そのような母集団からFINを構築するためのアルゴリズムを提示することを示しました。次に、FINkNNの相似性測定の基礎を形成する任意の形状のメンバーシップ関数を持つFIN間の格子理論距離を示します。FINkNNを、Hellenic SugarIndustryから提供された測定値の母集団に基づいて年間砂糖生産量を予測するタスクに適用します。FINkNNがこのタスクの予測精度を向上させることを示し、これらの手法のより広範な範囲と潜在的な有用性について説明します。

On Nearest-Neighbor Error-Correcting Output Codes with Application to All-Pairs Multiclass Support Vector Machines
全ペア多クラスサポートベクトルマシンへの適用による最近傍誤り訂正出力コードについて

A common way of constructing a multiclass classifier is by combining the outputs of several binary ones, according to an error-correcting output code (ECOC) scheme. The combination is typically done via a simple nearest-neighbor rule that finds the class that is closest in some sense to the outputs of the binary classifiers. For these nearest-neighbor ECOCs, we improve existing bounds on the error rate of the multiclass classifier given the average binary distance. The new bounds provide insight into the one-versus-rest and all-pairs matrices, which are compared through experiments with standard datasets. The results also show why elimination (also known as DAGSVM) and Hamming decoding often achieve the same accuracy.

多クラス分類子を構築する一般的な方法は、エラー訂正出力コード(ECOC)スキームに従って、いくつかのバイナリ分類子の出力を組み合わせることです。この組み合わせは、通常、バイナリ分類器の出力に何らかの意味で最も近いクラスを見つける単純な最近傍ルールによって行われます。これらの最近傍ECOCでは、平均バイナリ距離を前提として、多クラス分類器のエラー率の既存の境界を改善します。新しい境界は、標準データセットでの実験を通じて比較されるone-versus-rest行列とall-pair行列に関する洞察を提供します。この結果は、エリミネーション(DAGSVM)とハミング復号化で同じ精度を達成することが多い理由も示しています。

関連記事