Journal of Statistical Software: Volume 107の記事一覧

Journal of Statistical Software Volume 107に記載されている内容を一覧にまとめ、機械翻訳を交えて日本語化し掲載します。

記事

Modeling Population Growth in R with the biogrowth Package
バイオグロースパッケージによるRの人口増加のモデル化

The growth of populations is of interest in a broad variety of fields, such as epidemiology, economics or biology. Although a large variety of growth models are available in the scientific literature, their application usually requires advanced knowledge of mathematical programming and statistical inference, especially when modelling growth under dynamic environmental conditions. This article presents the biogrowth package for R, which implements functions for modelling the growth of populations. It can predict growth under static or dynamic environments, considering the effect of an arbitrary number of environmental factors. Moreover, it can be used to fit growth models to data gathered under static or dynamic environmental conditions. The package allows the user to fix any model parameter prior to the fit, an approach that can mitigate identifiability issues associated to growth models. The package includes common S3 methods for visualization and statistical analysis (summary of the fit, predictions, . . . ), easing result interpretation. It also includes functions for model comparison/selection. We illustrate the functions in biogrowth using examples from food science and economy.
人口の増加は、疫学、経済学、生物学など、さまざまな分野で関心を集めています。科学文献にはさまざまな成長モデルがありますが、その適用には通常、特に動的な環境条件下での成長をモデル化する場合、数学的プログラミングと統計的推論の高度な知識が必要です。この記事では、集団の成長をモデル化するための関数を実装するRのbiogrowthパッケージについて説明します。任意の数の環境要因の影響を考慮して、静的または動的な環境での成長を予測できます。さらに、静的または動的な環境条件下で収集されたデータに成長モデルを適合させるために使用できます。このパッケージを使用すると、ユーザーはフィット前に任意のモデルパラメーターを修正できるため、成長モデルに関連する識別可能性の問題を軽減できます。このパッケージには、視覚化と統計分析(適合の要約、予測など)のための一般的なS3メソッドが含まれており、結果の解釈が容易になります。また、モデルの比較/選択のための機能も含まれています。私たちは、食品科学と経済学の例を使用して、バイオグロースにおける機能を説明します。

carat: An R Package for Covariate-Adaptive Randomization in Clinical Trials
carat：臨床試験における共変量適応無作為化のためのRパッケージ

Covariate-adaptive randomization is gaining popularity in clinical trials because they enable the generation of balanced allocations with respect to covariates. Over the past decade, substantial progress has been made in both new innovative randomization procedures and the theoretical properties of associated inferences. However, these results are scattered across the literature, and a single tool kit does not exist for use by clinical trial practitioners and researchers to conduct and evaluate these methods. The R package carat is proposed to address this need. It facilitates a broad range of covariate-adaptive randomization and testing procedures, such as the most common and classical methods, and also reflects recent developments in the field. The package contains comprehensive evaluation and comparison tools for use in both randomization procedures and tests. This enables power analysis to be conducted to assist the planning of a covariate-adaptive clinical trial. The package also implements a command-line interface to allow for an interactive allocation procedure, which is typically the case in real-world applications. In this paper, the features and functionalities of carat are presented.
共変量適応ランダム化は、共変量に関してバランスの取れた配分を生成できるため、臨床試験で人気が高まっています。過去10年間で、新しい革新的なランダム化手順と関連する推論の理論的特性の両方で大きな進歩が見られました。しかし、これらの結果は文献に散らばっており、臨床試験の開業医や研究者がこれらの方法を実施および評価するために使用するための単一のツールキットは存在しません。このニーズに応えるために提案されているのがRパッケージカラットです。これは、最も一般的な方法や古典的な方法など、幅広い共変量適応ランダム化およびテスト手順を容易にし、この分野の最近の開発も反映しています。このパッケージには、無作為化手順とテストの両方で使用できる包括的な評価および比較ツールが含まれています。これにより、共変量適応臨床試験の計画を支援するための検出力解析を実施できます。また、このパッケージには、実際のアプリケーションで一般的に見られる対話型割り当てプロシージャを可能にするコマンドラインインターフェイスも実装されています。この論文では、カラットの特徴と機能について説明します。

REndo: Internal Instrumental Variables to Address Endogeneity
REndo：内生性に対処するための内部操作変数

Endogeneity is a common problem in any causal analysis. It arises when the independence assumption between an explanatory variable and the error in a statistical model is violated. The causes of endogeneity are manifold and include response bias in surveys, omission of important explanatory variables, or simultaneity between explanatory and response variables. Instrumental variable estimation provides a possible solution. However, valid and strong external instruments are difficult to find. Consequently, internal instrumental variable approaches have been proposed to correct for endogeneity without relying on external instruments. The R package REndo implements various internal instrumental variable approaches, i.e., latent instrumental variables estimation (Ebbes, Wedel, Boeckenholt, and Steerneman 2005), higher moments estimation (Lewbel 1997), heteroscedastic error estimation (Lewbel 2012), joint estimation using copula (Park and Gupta 2012) and multilevel generalized method of moments estimation (Kim and Frees 2007). Package usage is illustrated on simulated and real-world data.
内生性は、あらゆる因果分析に共通する問題です。これは、説明変数と統計モデルの誤差との間の独立性の仮定が破られた場合に発生します。内生性の原因は多様で、調査における応答バイアス、重要な説明変数の省略、説明変数と応答変数の同時性などがあります。操作変数推定は、可能な解決策を提供します。しかし、有効で強力な外部機器を見つけるのは困難です。その結果、外部機器に依存せずに内生性を補正するために、内部操作変数アプローチが提案されています。RパッケージREndoは、潜在操作変数推定(Ebbes, Wedel, Boeckenholt, and Steerneman 2005)、高次モーメント推定(Lewbel 1997)、ヘテロセダスティック誤差推定(Lewbel 2012)、コピュラを使用した同時推定(Park and Gupta 2012)、モーメント推定のマルチレベル一般化法(Kim and Frees 2007)など、さまざまな内部操作変数アプローチを実装しています。パッケージの使用方法は、シミュレーションデータと実際のデータに示されています。

DataFrames.jl: Flexible and Fast Tabular Data in Julia
DataFrames.jl：Juliaの柔軟で高速な表形式データ

DataFrames.jl is a package written for and in the Julia language offering flexible and efficient handling of tabular data sets in memory. Thanks to Julia’s unique strengths, it provides an appealing set of features: Rich support for standard data processing tasks and excellent flexibility and efficiency for more advanced and non-standard operations. We present the fundamental design of the package and how it compares with implementations of data frames in other languages, its main features, performance, and possible extensions. We conclude with a practical illustration of typical data processing operations.
DataFrames.jlは、Julia言語用に書かれたパッケージで、メモリ内の表形式データセットを柔軟かつ効率的に処理できます。Juliaのユニークな強みにより、標準的なデータ処理タスクに対する豊富なサポートと、より高度で非標準的な操作に対する優れた柔軟性と効率性など、魅力的な機能セットを提供します。パッケージの基本的な設計と、他の言語のデータフレームの実装との比較、主な機能、パフォーマンス、および可能な拡張について説明します。最後に、一般的なデータ処理操作の実践的な図解で締めくくります。

ARCHModels.jl: Estimating ARCH Models in Julia
ARCHModels.jl：JuliaでのARCHモデルの推定

This paper introduces ARCHModels.jl, a package for the Julia programming language that implements a number of univariate and multivariate autoregressive conditional heteroskedasticity models. This model class is the workhorse tool for modeling the conditional volatility of financial assets. The distinguishing feature of these models is that they model the latent volatility as a (deterministic) function of past returns and volatilities. This recursive structure results in loop-heavy code which, due to its just-in-time compiler, Julia is well-equipped to handle. As such, the entire package is written in Julia, without any binary dependencies. We benchmark the performance of ARCHModels.jl against popular implementations in MATLAB, R, and Python, and illustrate its use in a detailed case study.
この論文では、多数の単変量および多変量自己回帰条件付き不均一性モデルを実装するJuliaプログラミング言語のパッケージであるARCHModels.jlを紹介します。このモデル・クラスは、金融資産の条件付きボラティリティをモデル化するための主力ツールです。これらのモデルの際立った特徴は、潜在的なボラティリティを過去のリターンとボラティリティの(決定論的)関数としてモデル化していることです。この再帰的な構造により、ループ負荷の高いコードが生成されますが、ジャストインタイムコンパイラにより、Juliaは処理に十分な設備が整っています。そのため、パッケージ全体はJuliaで書かれており、バイナリの依存関係はありません。ARCHModels.jlのパフォーマンスをMATLAB、R、Pythonの一般的な実装と比較し、詳細なケーススタディでその使用を示します。

varTestnlme: An R Package for Variance Components Testing in Linear and Nonlinear Mixed-Effects Models
varTestnlme：線形および非線形混合効果モデルでの分散成分試験用のRパッケージ

The issue of variance components testing arises naturally when building mixed-effects models, to decide which effects should be modeled as fixed or random or to build parsimonious models. While tests for fixed effects are available in R for models fitted with lme4, tools are missing when it comes to random effects. The varTestnlme package for R aims at filling this gap. It allows to test whether a subset of the variances and covariances corresponding to a subset of the random effects, are equal to zero using asymptotic property of the likelihood ratio test statistic. It also offers the possibility to test simultaneously for fixed effects and variance components. It can be used for linear, generalized linear or nonlinear mixed-effects models fitted via lme4, nlme or saemix. Numerical methods used to implement the test procedure are detailed and examples based on different real datasets using different mixed models are provided. Theoretical properties of the used likelihood ratio test are recalled.
分散成分テストの問題は、混合効果モデルを構築するときに自然に発生し、固定またはランダムとしてモデル化する効果を決定するため、または倹約モデルを構築します。固定効果のテストは、lme4を装着したモデルではRで使用できますが、変量効果に関してはツールがありません。RのvarTestnlmeパッケージは、このギャップを埋めることを目的としています。それは、ランダム効果のサブセットに対応する分散と共分散のサブセットが、尤度比検定統計量の漸近特性を使用してゼロに等しいかどうかを検定することができます。また、固定効果と分散成分を同時にテストする可能性も提供します。これは、lme4、nlme、またはsaemixを介して適合された線形、一般化線形、または非線形の混合効果モデルに使用できます。テスト手順の実装に使用される数値的手法は詳細であり、異なる混合モデルを使用したさまざまな実際のデータセットに基づく例が提供されています。使用された尤度比検定の理論的特性が呼び出されます。

Panel Data Visualization in R (panelView) and Stata (panelview)
R（panelView）とStata（panelview）でのパネル・データの視覚化

We develop an R package panelView and a Stata package panelview for panel data visualization. They are designed to assist causal analysis with panel data and have three main functionalities: (1) They plot the treatment status and missing values in a panel dataset; (2) they visualize the temporal dynamics of the main variables of interest; and (3) they depict the bivariate relationships between a treatment variable and an outcome variable either by unit or in aggregate. These tools can help researchers better understand their panel datasets before conducting statistical analysis.
パネルデータの可視化のために、RパッケージpanelViewとStataパッケージpanelviewを開発しています。これらは、パネルデータによる因果分析を支援するように設計されており、次の3つの主要な機能を備えています:(1)パネルデータセットに治療ステータスと欠損値をプロットします。(2)彼らは、関心のある主要な変数の時間的ダイナミクスを視覚化します。(3)それらは、治療変数と結果変数との間の二変量関係を単位単位または総計で示しています。これらのツールは、研究者が統計分析を行う前にパネルデータセットをよりよく理解するのに役立ちます。

GMM Estimators for Binary Spatial Models in R
Rのバイナリ空間モデルのGMM推定量

Despite the huge availability of software to estimate cross-sectional spatial models, there are only few functions to estimate models dealing with spatial limited dependent variable. This paper fills this gap introducing the new R package spldv. The package is based on generalized methods of moment (GMM) estimators and includes a series of one- and two-step estimators based on different choices of the weighting matrix for the moments conditions in the first step, and different estimators for the variance-covariance matrix of the estimated coefficients. An important feature of spldv is that users can estimate the spatial Durbin model and compute the direct, indirect, and total effects in a friendly and flexible way.
横断的な空間モデルを推定するソフトウェアは膨大な数で利用可能ですが、空間的に制限された従属変数を扱うモデルを推定する関数はほとんどありません。このホワイトペーパーでは、新しいRパッケージspldvを紹介するこのギャップを埋めます。このパッケージは、一般化モーメント法(GMM)推定量に基づいており、最初のステップのモーメント条件の重み付け行列の異なる選択に基づく一連の1ステップおよび2ステップ推定量と、推定係数の分散共分散行列の異なる推定量が含まれています。spldvの重要な特徴は、ユーザーが空間的なDurbinモデルを推定し、直接的、間接的、および全体的な効果を親しみやすく柔軟な方法で計算できることです。

Efficient Multiple Imputation for Diverse Data in Python and R: MIDASpy and rMIDAS
PythonとRの多様なデータに対する効率的な多重代入：MIDASpyとrMIDAS

This paper introduces software packages for efficiently imputing missing data using deep learning methods in Python (MIDASpy) and R (rMIDAS). The packages implement a recently developed approach to multiple imputation known as MIDAS, which involves introducing additional missing values into the dataset, attempting to reconstruct these values with a type of unsupervised neural network known as a denoising autoencoder, and using the resulting model to draw imputations of originally missing data. These steps are executed by a fast and flexible algorithm that expands both the quantity and the range of data that can be analyzed with multiple imputation. To help users optimize the algorithm for their particular application, MIDASpy and rMIDAS offer a host of user-friendly tools for calibrating and validating the imputation model. We provide a detailed guide to these functionalities and demonstrate their usage on a large real dataset.
本稿では、Python(MIDASpy)とR(rMIDAS)の深層学習手法を用いて、欠損データを効率的に補完するためのソフトウェアパッケージについて紹介します。このパッケージは、最近開発されたMIDASと呼ばれる多重代入のアプローチを実装しており、データセットに追加の欠損値を導入し、ノイズ除去オートエンコーダーと呼ばれる教師なしニューラルネットワークの一種でこれらの値を再構築し、結果として得られるモデルを使用して元々欠落していたデータの代入を描画します。これらのステップは、多重代入で分析できるデータの量と範囲の両方を拡大する高速で柔軟なアルゴリズムによって実行されます。ユーザーが特定のアプリケーションに合わせてアルゴリズムを最適化できるように、MIDASpyとrMIDASは、補完モデルのキャリブレーションと検証のためのユーザーフレンドリーなツールを多数提供しています。これらの機能の詳細なガイドを提供し、大規模な実際のデータセットでの使用法を示します。

hdpGLM: An R Package to Estimate Heterogeneous Effects in Generalized Linear Models Using Hierarchical Dirichlet Process
hdpGLM：階層的ディリクレ過程を用いた一般化線形模型における不均一効果を推定するためのRパッケージ

The existence of latent clusters with different responses to a treatment is a major concern in scientific research, as latent effect heterogeneity often emerges due to latent or unobserved features – e.g., genetic characteristics, personality traits, or hidden motivations – of the subjects. Conventional random- and fixed-effects methods cannot be applied to that heterogeneity if the group markers associated with that heterogeneity are latent or unobserved. Alternative methods that combine regression models and clustering procedures using Dirichlet process are available, but these methods are complex to implement, especially for non-linear regression models with discrete or binary outcomes. This article discusses the R package hdpGLM as a means of implementing a novel hierarchical Dirichlet process approach to estimate mixtures of generalized linear models outlined in Ferrari (2020). The methods implemented make it easy for researchers to investigate heterogeneity in the effect of treatment or background variables and identify clusters of subjects with differential effects. This package provides several features for out-of-the-box estimation and to generate numerical summaries and visualizations of the results. A comparison with other similar R packages is provided.
治療に対する反応が異なる潜在クラスターの存在は、被験者の潜在的または観察されていない特徴(遺伝的特性、性格特性、または隠された動機など)によって潜在効果の不均一性がしばしば現れるため、科学研究における主要な懸念事項です。従来のランダム効果法や固定効果法は、その不均一性に関連するグループマーカーが潜在的であるか観察されていない場合、その不均一性に適用することはできません。回帰モデルとディリクレプロセスを使用したクラスタリング手順を組み合わせた代替方法もありますが、これらの方法は、特に離散またはバイナリの結果を持つ非線形回帰モデルの場合、実装が複雑です。この記事では、Ferrari(2020)で概説されている一般化線形モデルの混合物を推定するための新しい階層的ディリクレプロセスアプローチを実装する手段として、RパッケージhdpGLMについて説明します。実装された方法により、研究者は治療または背景変数の影響の不均一性を調査し、異なる効果を持つ被験者のクラスターを特定することが容易になります。このパッケージには、すぐに使用できる推定機能と、結果の数値サマリーと視覚化を生成するためのいくつかの機能が用意されています。他の類似のRパッケージとの比較が提供されています。

記事

関連記事