Journal of Statistical Software: Volume 102の記事一覧

Journal of Statistical Software Volume 102に記載されている内容を一覧にまとめ、機械翻訳を交えて日本語化し掲載します。

記事

GaussianProcesses.jl: A Nonparametric Bayes Package for the Julia Language
GaussianProcesses.jl：Julia言語のためのノンパラメトリックベイズパッケージ

Gaussian processes are a class of flexible nonparametric Bayesian tools that are widely used across the sciences, and in industry, to model complex data sources. Key to applying Gaussian process models is the availability of well-developed open source software, which is available in many programming languages. In this paper, we present a tutorial of the GaussianProcesses.jl package that has been developed for the Julia programming language. GaussianProcesses.jl utilizes the inherent computational benefits of the Julia language, including multiple dispatch and just-in-time compilation, to produce a fast, flexible and user-friendly Gaussian processes package. The package provides many mean and kernel functions with supporting inference tools to fit exact Gaussian process models, as well as a range of alternative likelihood functions to handle non-Gaussian data (e.g., binary classification models) and sparse approximations for scalable Gaussian processes. The package makes efficient use of existing Julia packages to provide users with a range of optimization and plotting tools.
ガウス過程は、複雑なデータソースをモデル化するために、科学全体や産業界で広く使用されている柔軟なノンパラメトリックベイズツールの一種です。ガウスプロセスモデルを適用するための鍵は、多くのプログラミング言語で利用可能な十分に開発されたオープンソースソフトウェアの可用性です。この論文では、Juliaプログラミング言語用に開発されたGaussianProcesses.jlパッケージのチュートリアルを紹介します。GaussianProcesses.jlは、複数のディスパッチやジャストインタイムコンパイルなど、Julia言語の固有の計算上の利点を利用して、高速で柔軟性があり、ユーザーフレンドリーなGaussianプロセスパッケージを生成します。このパッケージには、正確なガウス過程モデルを適合させるための推論ツールを備えた多くの平均関数とカーネル関数、非ガウスデータ(二項分類モデルなど)を処理するためのさまざまな代替尤度関数、スケーラブルなガウス過程のスパース近似が用意されています。このパッケージは、既存のJuliaパッケージを効率的に使用して、ユーザーにさまざまな最適化およびプロットツールを提供します。

Multivariate Normal Variance Mixtures in R: The R Package nvmix
Rの多変量正規分散混合物：Rパッケージnvmix

We present the features and implementation of the R package nvmix for the class of normal variance mixtures including Student t and normal distributions. The package provides functionalities for such distributions, notably the evaluation of the distribution and density function as well as likelihood-based parameter estimation. The distributional family is specified through the quantile function of the underlying mixing random variable. The R package nvmix thus allows one to model multivariate distributions well beyond the classical multivariate normal and t case. Additional functionalities include graphical goodness-of-fit assessment, the estimation of the risk measures value-at-risk and expected shortfall for univariate normal variance mixture distributions and functions to work with normal variance mixture copulas, such as sampling and the evaluation of normal variance mixture copulas and their densities. Furthermore, the package nvmix also provides functionalities for the evaluation of the distribution and density function as well as random variate generation for the more general class of grouped normal variance mixtures.
スチューデントtと正規分布を含む正規分散混合物のクラスに対するRパッケージnvmixの機能と実装を示します。このパッケージは、このような分布の機能、特に分布と密度関数の評価、および尤度ベースのパラメーター推定を提供します。分布ファミリーは、基になる混合確率変数の分位点関数によって指定されます。したがって、Rパッケージnvmixを使用すると、古典的な多変量正規分布とtケースをはるかに超えて多変量分布をモデル化できます。その他の機能には、グラフィカルな適合度評価、リスク測定の推定、単変量正規分散混合物分布のリスク値と予想不足、および正規分散混合コピュラのサンプリングや評価、およびそれらの密度の評価など、正規分散混合コピュラで動作する関数が含まれます。さらに、パッケージnvmixは、分布関数と密度関数の評価、およびグループ化された正規分散混合物のより一般的なクラスのランダム変量生成のための機能も提供します。

covsim: An R Package for Simulating Non-Normal Data for Structural Equation Models Using Copulas
covsim：コピュラを用いた構造方程式モデルの非正規データをシミュレーションするためのRパッケージ

In factor analysis and structural equation modeling non-normal data simulation is traditionally performed by specifying univariate skewness and kurtosis together with the target covariance matrix. However, this leaves little control over the univariate distributions and the multivariate copula of the simulated vector. In this paper we explain how a more flexible simulation method called vine-to-anything (VITA) may be obtained from copula-based techniques, as implemented in a new R package, covsim. VITA is based on the concept of a regular vine, where bivariate copulas are coupled together into a full multivariate copula. We illustrate how to simulate continuous and ordinal data for covariance modeling, and how to use the new package discnorm to test for underlying normality in ordinal data. An introduction to copula and vine simulation is provided in the appendix.
因子分析と構造方程式モデリングでは、非正規データシミュレーションは、従来、単変量の歪度と尖度をターゲット共分散行列と共に指定することによって実行されます。ただし、これにより、シミュレートされたベクトルの単変量分布と多変量コピュラをほとんど制御できません。このホワイトペーパーでは、新しいRパッケージであるcovsimに実装されているコピュラベースの手法から、vine-to-anything(VITA)と呼ばれるより柔軟なシミュレーション手法を取得する方法について説明します。VITAは、二変量コピュラが完全な多変量コピュラに結合される通常のブドウの木の概念に基づいています。共分散モデリングのために連続データと順序データをシミュレートする方法、および新しいパッケージdiscnormを使用して順序データの基本的な正規性をテストする方法を示します。コピュラとブドウの木のシミュレーションについては、付録で紹介しています。

rags2ridges: A One-Stop-ℓ2-Shop for Graphical Modeling of High-Dimensional Precision Matrices
rags2ridges：高次元精密行列のグラフィカルモデリングのためのワンストップl2ショップ

A graphical model is an undirected network representing the conditional independence properties between random variables. Graphical modeling has become part and parcel of systems or network approaches to multivariate data, in particular when the variable dimension exceeds the observation dimension. rags2ridges is an R package for graphical modeling of high-dimensional precision matrices through ridge (ℓ2) penalties. It provides a modular framework for the extraction, visualization, and analysis of Gaussian graphical models from high-dimensional data. Moreover, it can handle the incorporation of prior information as well as multiple heterogeneous data classes. As such, it provides a one-stop-ℓ2-shop for graphical modeling of high-dimensional precision matrices. The functionality of the package is illustrated with an example dataset pertaining to blood-based metabolite measurements in persons suffering from Alzheimer’s disease.
グラフィカルモデルは、確率変数間の条件付き独立性プロパティを表す無向ネットワークです。グラフィカルモデリングは、特に変数の次元が観測の次元を超える場合に、多変量データに対するシステムまたはネットワークアプローチの一部となっています。rags2ridgesは、リッジ(l2)ペナルティによる高次元精密行列のグラフィカルモデリングのためのRパッケージです。これは、高次元データからガウスグラフィカルモデルを抽出、視覚化、および分析するためのモジュラーフレームワークを提供します。さらに、事前情報の組み込みだけでなく、複数の異種データクラスも処理できます。そのため、高次元精度行列のグラフィカルモデリングのためのワンストップl2ショップを提供します。パッケージの機能は、アルツハイマー病に苦しむ人の血液ベースの代謝物測定に関するデータセットの例で示されています。

sensobol: An R Package to Compute Variance-Based Sensitivity Indices
sensobol：分散ベースの感度指数を計算するためのRパッケージ

The R package sensobol provides several functions to conduct variance-based uncertainty and sensitivity analysis, from the estimation of sensitivity indices to the visual representation of the results. It implements several state-of-the-art first and total-order estimators and allows the computation of up to fourth-order effects, as well as of the approximation error, in a swift and user-friendly way. Its flexibility makes it also appropriate for models with either a scalar or a multivariate output. We illustrate its functionality by conducting a variance-based sensitivity analysis of three classic models: the Sobol’ (1998) G function, the logistic population growth model of Verhulst (1845), and the spruce budworm and forest model of Ludwig, Jones, and Holling (1976).
Rパッケージのsensobolは、感度指数の推定から結果の視覚的表現まで、分散に基づく不確実性と感度分析を行うためのいくつかの機能を提供します。これは、いくつかの最先端の一次および全次推定量を実装し、最大4次効果と近似誤差の計算を迅速かつユーザーフレンドリーな方法で行うことができます。その柔軟性により、スカラー出力または多変量出力を持つモデルにも適しています。「Sobol」(1998)G関数、Verhulst(1845)のロジスティック人口増加モデル、およびLudwig、Jones、およびHolling(1976)のトウヒのつぼみと森林モデルの3つの古典的なモデルの分散ベースの感度分析を行うことにより、その機能を説明します。

The R Package stagedtrees for Structural Learning of Stratified Staged Trees
層状段付き樹木の構造学習のためのRパッケージstagedtrees

stagedtrees is an R package which includes several algorithms for learning the structure of staged trees and chain event graphs from data. Score-based and clustering-based algorithms are implemented, as well as various functionalities to plot the models and perform inference. The capabilities of stagedtrees are illustrated using mainly two datasets both included in the package or bundled in R.
stagedtreesは、ステージングされたツリーの構造とデータからのチェーンイベントグラフを学習するためのいくつかのアルゴリズムを含むRパッケージです。スコアベースおよびクラスタリングベースのアルゴリズムが実装されているほか、モデルをプロットして推論を実行するためのさまざまな機能が実装されています。stagedtreesの機能は、主に2つのデータセット(どちらもパッケージに含まれているか、Rにバンドルされている)を使用して示されています。

NeuralSens: Sensitivity Analysis of Neural Networks
NeuralSens：ニューラルネットワークの感度分析

This article presents the NeuralSens package that can be used to perform sensitivity analysis of neural networks using the partial derivatives method. The main function of the package calculates the partial derivatives of the output with regard to the input variables of a multi-layer perceptron model, which can be used to evaluate variable importance based on sensitivity measures and characterize relationships between input and output variables. Methods to calculate partial derivatives are provided for objects trained using common neural network packages in R, and a ‘numeric’ method is provided for objects from packages which are not included. The package also includes functions to plot the information obtained from the sensitivity analysis. The article contains an overview of techniques for obtaining information from neural network models, a theoretical foundation of how partial derivatives are calculated, a description of the package functions, and applied examples to compare NeuralSens functions with analogous functions from other available R packages.
この記事では、偏微分法を使用してニューラルネットワークの感度解析を行うために使用できるNeuralSensパッケージについて説明します。パッケージの主な機能は、多層パーセプトロンモデルの入力変数に関する出力の偏微分を計算し、感度測定に基づいて変数の重要度を評価し、入力変数と出力変数の関係を特徴付けるために使用できます。偏導関数を計算する方法は、Rの一般的なニューラルネットワークパッケージを使用して学習されたオブジェクトに対して提供され、含まれていないパッケージのオブジェクトに対しては’数値’メソッドが提供されます。このパッケージには、感度解析から取得した情報をプロットする機能も含まれています。この記事では、ニューラルネットワークモデルから情報を取得する手法の概要、偏導関数の計算方法の理論的基礎、パッケージ関数の説明、およびNeuralSens関数を他の利用可能なRパッケージの類似関数と比較するための適用例が含まれています。

econet: An R Package for Parameter-Dependent Network Centrality Measures
econet：パラメータ依存ネットワーク中心性測度のためのRパッケージ

The R package econet provides methods for estimating parameter-dependent network centrality measures with linear-in-means models. Both nonlinear least squares and maximum likelihood estimators are implemented. The methods allow for both link and node heterogeneity in network effects, endogenous network formation and the presence of unconnected nodes. The routines also compare the explanatory power of parameter-dependent network centrality measures with those of standard measures of network centrality. Benefits and features of the econet package are illustrated using data from Battaglini and Patacchini (2018) and Battaglini, Leone Sciabolazza, and Patacchini (2020).
Rパッケージeconetは、線形平均モデルを使用してパラメーター依存のネットワーク中心性測度を推定する方法を提供します。非線形最小二乗推定量と最尤推定量の両方が実装されています。この手法では、ネットワーク効果、内因性ネットワーク形成、および接続されていないノードの存在におけるリンクとノードの両方の不均一性が可能になります。また、ルーチンは、パラメーターに依存するネットワーク中心性測度の説明力を、ネットワーク中心性の標準測度の説明力と比較します。エコネットパッケージの利点と特徴は、Battaglini and Patacchini (2018)とBattaglini, Leone Sciabolazza, and Patacchini (2020)のデータを使用して示されています。

Event History Regression with Pseudo-Observations: Computational Approaches and an Implementation in R
疑似観測値によるイベント履歴回帰:計算アプローチとRでの実装

Due to tradition and ease of estimation, the vast majority of clinical and epidemiological papers with time-to-event data report hazard ratios from Cox proportional hazards regression models. Although hazard ratios are well known, they can be difficult to interpret, particularly as causal contrasts, in many settings. Nonparametric or fully parametric estimators allow for the direct estimation of more easily causally interpretable estimands such as the cumulative incidence and restricted mean survival. However, modeling these quantities as functions of covariates is limited to a few categorical covariates with nonparametric estimators, and often requires simulation or numeric integration with parametric estimators. Combining pseudo-observations based on non-parametric estimands with parametric regression on the pseudo-observations allows for the best of these two approaches and has many nice properties. In this paper, we develop a user friendly, easy to understand way of doing event history regression for the cumulative incidence and the restricted mean survival, using the pseudo-observation framework for estimation. The interface uses the well known formulation of a generalized linear model and allows for features including plotting of residuals, the use of sampling weights, and correct variance estimation.
伝統と推定の容易さにより、イベントまでの時間データを含む臨床論文と疫学論文の大部分は、Cox比例ハザード回帰モデルからのハザード比を報告しています。ハザード比はよく知られていますが、多くの状況で、特に因果関係の対比として解釈するのが難しい場合があります。ノンパラメトリックまたはフルパラメトリック推定量を使用すると、累積発生率や制限付き平均生存期間など、より簡単に因果的に解釈できる推定値を直接推定できます。ただし、これらの量を共変量の関数としてモデル化することは、ノンパラメトリック推定量を持ついくつかのカテゴリカル共変量に限定されており、多くの場合、シミュレーションまたはパラメトリック推定量との数値統合が必要になります。ノンパラメトリック推定値に基づく擬似観測値と擬似観測値のパラメトリック回帰を組み合わせると、これら2つのアプローチの長所が得られ、多くの優れた特性が得られます。この論文では、推定のための疑似観測フレームワークを使用して、累積発生率と制限された平均生存期間のイベント履歴回帰を行うためのユーザーフレンドリーで理解しやすい方法を開発します。このインターフェイスでは、一般化線形モデルのよく知られた定式化が使用され、残差のプロット、サンプリング重みの使用、正しい分散推定などの機能を使用できます。

More on Multidimensional Scaling and Unfolding in R: smacof Version 2
Rでの多次元スケーリングと展開の詳細：smacofバージョン2

The smacof package offers a comprehensive implementation of multidimensional scaling (MDS) techniques in R. Since its first publication (De Leeuw and Mair 2009b) the functionality of the package has been enhanced, and several additional methods, features and utilities were added. Major updates include a complete re-implementation of multidimensional unfolding allowing for monotone dissimilarity transformations, including row-conditional, circular, and external unfolding. Additionally, the constrained MDS implementation was extended in terms of optimal scaling of the external variables. Further package additions include various tools and functions for goodness-of-fit assessment, unidimensional scaling, gravity MDS, asymmetric MDS, Procrustes, and MDS biplots. All these new package functionalities are illustrated using a variety of real-life applications.
smacofパッケージは、Rでの多次元スケーリング(MDS)手法の包括的な実装を提供します。最初の公開(De Leeuw and Mair 2009b)以来、パッケージの機能は強化され、いくつかの追加のメソッド、機能、およびユーティリティが追加されました。主な更新には、多次元アンフォールディングの完全な再実装が含まれ、行条件付き、円形、外部アンフォールディングなど、単調な非類似度変換が可能になります。さらに、制約付きMDS実装は、外部変数の最適なスケーリングの観点から拡張されました。さらに、適合度評価、単次元スケーリング、重力MDS、非対称MDS、Procrustes、MDSバイプロットのためのさまざまなツールや機能も追加されています。これらの新しいパッケージ機能はすべて、さまざまな実際のアプリケーションを使用して示されています。

コード・スニペット

Monotone Regression: A Simple and Fast O(n) PAVA Implementation
モノトーン回帰：シンプルで高速なO(n)PAVA実装

Efficient coding and improvements in the execution order of the up-and-down-blocks algorithm for monotone or isotonic regression leads to a significant increase in speed as well as a short and simple O(n) implementation. Algorithms that use monotone regression as a subroutine, e.g., unimodal or bivariate monotone regression, also benefit from the acceleration. A substantive comparison with and characterization of currently available implementations provides an extensive overview of up-and-down-blocks implementations for the pool-adjacent-violators algorithm for simple linear ordered monotone regression.
モノトーン回帰または等張回帰のアップ/ダウンブロックアルゴリズムの効率的なコーディングと実行順序の改善により、速度が大幅に向上し、O(n)の実装が短くシンプルになります。単調回帰をサブルーチンとして使用するアルゴリズム(単峰性単調回帰や二変量単調回帰など)も、この加速の恩恵を受けます。現在利用可能な実装との実質的な比較と特性評価により、単純な線形順序単調回帰のプール隣接違反者アルゴリズムのアップブロックとダウンブロックの実装の広範な概要が提供されます。

書評

Pro Data Visualization Using R and JavaScript: Analyze and Visualize Key Data on the Web
RとJavaScriptを使用したプロフェッショナルなデータの視覚化：Web上の主要なデータを分析および視覚化する

Doing Meta-Analysis with R – A Hands-On Guide
Rによるメタアナリシスの実施-ハンズオンガイド

記事

コード・スニペット

書評

関連記事