为什么会长寻常疣| 小儿积食吃什么药最好| 什么东西可以淡化疤痕| 柔软的近义词是什么| 鸟牌是什么牌子的衣服| 双子座是什么性格| 利福喷丁和利福平有什么区别| 怜惜是什么意思| hpv52型阳性是什么意思严重吗| 酸菜鱼加什么配菜好吃| 湿疹不能吃什么食物| 鹿晗的原名叫什么| aed是什么| 86岁属什么| 4月24号是什么星座| 天珠是什么| 蜜枣是什么枣做的| 什么叫中出| 什么叫通勤| 胎心胎芽最晚什么时候出现| 枫叶是什么颜色的| 男人结扎对身体有什么影响| 肺部肿瘤不能吃什么| 胃出血大便是什么颜色| 1月份是什么星座的人| 甲功三项能查出什么病| 总经理是什么级别| 胡子长得快是什么原因| 皮肤黏膜是什么| ast什么意思| 无咎是什么意思| 多囊什么意思| 办身份证需要带什么| 玉米须加什么治痛风| o型血是什么血| 拘留所和看守所有什么区别| 高血压可以吃什么| 静脉曲张挂什么科室| 验大便能查出什么| 糖类抗原ca199偏高是什么原因| 时间观念是什么意思| 田野是什么意思| 想吃肉是身体缺什么| 舌头有点麻是什么病的前兆| 女生为什么喊你男神| 附骨疽是什么病| 头晕为什么做眼震检查| 肠上皮化生是什么意思| 霍金是什么病| 胃泌素17是什么检查| 小兔子吃什么食物| 为什么警察叫条子| timing是什么意思| 牙周袋是什么| 什么是梅尼埃病| 吃了头孢不能吃什么| 法国铁塔叫什么名字| 动物园里有什么动物| 内务是什么意思| 陈皮是什么做的| q1什么意思| 肾彩超能查出什么| 姓傅的男孩取什么名字| 包皮过长挂什么科| 汗味重是什么原因| 四维彩超什么时候做| 做梦梦到已故的亲人是什么意思| 什么人生病不看医生| pg什么意思| 什么是性瘾| 耳蜗是什么| 狗癣用什么药最有效| 梦见入室抢劫意味什么| dha什么时候吃最好| 脐动脉2条是什么意思| 身心俱疲是什么意思| 须菩提是什么意思| 什么叫精索静脉曲张啊| 出柜是什么意思| 生育津贴是什么| 烂漫什么意思| 被蜱虫咬了有什么症状| 晟念什么字| 有机和无机是什么意思| 吃完饭恶心是什么原因| 手指头麻木吃什么药| 春天有什么花开| 口苦是什么病| 柳树像什么| ufo是什么| 阿罗裤是什么意思| 吃华法林不能吃什么| 双子座是什么象星座| 小孩坐火车需要什么证件| 噗噗噗是什么意思| 九加虎念什么| iic是什么意思| 幽门螺旋杆菌抗体阳性是什么意思| 俊五行属性是什么| 微信号为什么会封号| 感情洁癖什么意思| 黄帝是一个什么样的人| hk是什么意思| 感冒头疼是什么原因| 牙龈流血是什么原因| 手足口病用什么药| 闻所未闻是什么意思| 酸性体质是什么意思| 看病人买什么水果| 吃什么调理卵巢早衰| 什么像| 红血丝用什么护肤品修复比较好| 为什么会有甲状腺结节| 三个毛念什么字| 尼日利亚说什么语言| 白斑不能吃什么| 不禁是什么意思| 打鸟是什么意思| psc是什么病| 心脏彩超挂什么科| 人造棉是什么面料| 乙肝235阳性是什么意思| 长期吃二甲双胍有什么副作用| 子宫内膜厚有什么影响| 肌无力是什么症状| 海参什么样的好| 朱砂有什么功效| 公子是你吗是什么歌| 洗面奶和洁面乳有什么区别| 什么蛋营养价值最高| 劳伦拉夫属于什么档次| 做梦梦见下大雨是什么意思| 和合双全是什么生肖| 茶氨酸是什么| 猪猪侠叫什么| 痛风打什么针| cva医学上是什么意思| 网球大满贯什么意思| cto是什么职位| 医院介入科是干什么的| 什么是超声检查| 白羊座男生喜欢什么样的女生| sat是什么考试| 吃避孕药有什么好处| 无话不谈是什么意思| 海棠花什么时候开花| 昆仑雪菊有什么功效| 质子是什么| 月经不停吃什么药止血效果比较好| 检查阳性是什么意思| 手指是什么生肖| 中国海警是什么编制| 肾虚吃什么食物| 女人取环什么时候最好| 头发五行属什么| pink是什么颜色| 路政是干什么的| 苯佐卡因是什么| 麦粒肿用什么药| 婴儿吐奶是什么原因| 肺慢性炎症是什么意思| 阴囊湿疹用什么药| 麦粒肿吃什么药| 男人小腹疼痛是什么原因| 什么东西不能吃| h型高血压是什么意思| 寻常疣是什么样子图片| 马华念什么| 胆囊炎什么症状| 背疼是什么原因| 同工同酬是什么意思| 喇叭裤配什么上衣| 手麻看什么科| 不到长城非好汉的下一句是什么| 做阴超有黄体说明什么| 什么花在春天开| dm表示什么单位| 半夜猫叫有什么预兆| 为什么硬不起来| 七夕节是什么节日| 自言自语是什么意思| 手柄是什么意思| 十二月份是什么星座| 火腿炒什么菜好吃| 仙人掌煎鸡蛋治什么病| 护理专业出来能干什么| 上海什么房子不限购| 美女的阴暗是什么样的| 白肉是什么肉| 前任是什么意思| 为什么老是梦见一个人| 三月是什么星座| 访谈是什么意思| 什么光没有亮度| 盆腔炎吃什么药| 8月12号是什么星座| 眼花是什么原因| 艺考音乐考什么| 肺的作用和功能是什么| 血脂高什么东西不能吃| 五二年属什么生肖| 乳痈是什么意思| 呼吸音粗是什么原因| 身体缺糖有什么症状| 身上长痘痘是什么原因| 尿气味很重是什么原因| 肝多发钙化灶什么意思| 丧偶式婚姻是什么意思| 这是什么虫子| 精液什么颜色正常| 出虚汗是什么原因引起的怎么调理| 微波炉可以做什么美食| 蚕豆是什么豆| 4月份有什么节日| 鼻子里面痒是什么原因| 肝火郁结是什么症状| 53年属什么| 上火吃什么食物| 蹲着有什么好处| 容易出汗什么原因| 吃南瓜子有什么好处| 碳素厂是做什么的| 月经有黑色血块是什么原因| 黄芪有什么功效| 久坐脚肿是什么原因| 儿童支气管炎吃什么药| 长白班是什么意思| 66.66红包代表什么意思| 可转债是什么| 淋巴细胞偏低是什么意思| 阴道口长什么样| 奎宁是什么药| 左什么结构| 屁股长痘痘是什么原因| 降血脂吃什么最好| 龟头上有小红点是什么| 油漆味对人有什么危害| 加湿器用什么水比较好| 送羊是什么意思| 吃海参有什么好处| 血脂是什么| 生死劫是什么意思| 放臭屁吃什么药| 仙居杨梅什么时候上市| 言外之意什么意思| 广州有什么特产必带| 主动脉夹层是什么原因引起的| 口腔医学技术可以考什么证| 纪委书记是什么级别| 爱而不得是什么意思| 无可奈何的笑是什么笑| 水加日念什么| 虎是什么意思| 猫咪喜欢什么颜色| 脑供血不足头晕吃什么药| 骨质硬化是什么意思| 什么时间容易怀孕| 银行卡开户名是什么| 喝栀子茶有什么好处| 经常放屁是什么原因| 生理期腰疼是什么原因| 屠苏酒是什么酒| 浅笑嫣然是什么意思| 唐氏筛查是检查什么| 百度Jump to content

特稿:厚植中非友誼 續寫合作新篇——寫在習近平主席提出對非真實親誠理念五周年之際

From Wikipedia, the free encyclopedia
Draws from the Dirichlet process . The four rows use different alpha (top to bottom: 1, 10, 100 and 1000) and each row contains three repetitions of the same experiment. As seen from the graphs, draws from a Dirichlet process are discrete distributions and they become less concentrated (more spread out) with increasing . The graphs were generated using the stick-breaking process view of the Dirichlet process.
百度 但是,我的手机还是响了。

In probability theory, Dirichlet processes (after the distribution associated with Peter Gustav Lejeune Dirichlet) are a family of stochastic processes whose realizations are probability distributions. In other words, a Dirichlet process is a probability distribution whose range is itself a set of probability distributions. It is often used in Bayesian inference to describe the prior knowledge about the distribution of random variables—how likely it is that the random variables are distributed according to one or another particular distribution.

As an example, a bag of 100 real-world dice is a random probability mass function (random pmf)—to sample this random pmf you put your hand in the bag and draw out a die, that is, you draw a pmf. A bag of dice manufactured using a crude process 100 years ago will likely have probabilities that deviate wildly from the uniform pmf, whereas a bag of state-of-the-art dice used by Las Vegas casinos may have barely perceptible imperfections. We can model the randomness of pmfs with the Dirichlet distribution.[1]

The Dirichlet process is specified by a base distribution and a positive real number called the concentration parameter (also known as scaling parameter). The base distribution is the expected value of the process, i.e., the Dirichlet process draws distributions "around" the base distribution the way a normal distribution draws real numbers around its mean. However, even if the base distribution is continuous, the distributions drawn from the Dirichlet process are almost surely discrete. The scaling parameter specifies how strong this discretization is: in the limit of , the realizations are all concentrated at a single value, while in the limit of the realizations become continuous. Between the two extremes the realizations are discrete distributions with less and less concentration as increases.

The Dirichlet process can also be seen as the infinite-dimensional generalization of the Dirichlet distribution. In the same way as the Dirichlet distribution is the conjugate prior for the categorical distribution, the Dirichlet process is the conjugate prior for infinite, nonparametric discrete distributions. A particularly important application of Dirichlet processes is as a prior probability distribution in infinite mixture models.

The Dirichlet process was formally introduced by Thomas S. Ferguson in 1973.[2] It has since been applied in data mining and machine learning, among others for natural language processing, computer vision and bioinformatics.

Introduction

[edit]

Dirichlet processes are usually used when modelling data that tends to repeat previous values in a so-called "rich get richer" fashion. Specifically, suppose that the generation of values can be simulated by the following algorithm.

Input: (a probability distribution called base distribution), (a positive real number called scaling parameter)
For :

a) With probability draw from .

b) With probability set , where is the number of previous observations of .
(Formally, where denotes the number of elements in the set.)

At the same time, another common model for data is that the observations are assumed to be independent and identically distributed (i.i.d.) according to some (random) distribution . The goal of introducing Dirichlet processes is to be able to describe the procedure outlined above in this i.i.d. model.

The observations in the algorithm are not independent, since we have to consider the previous results when generating the next value. They are, however, exchangeable. This fact can be shown by calculating the joint probability distribution of the observations and noticing that the resulting formula only depends on which values occur among the observations and how many repetitions they each have. Because of this exchangeability, de Finetti's representation theorem applies and it implies that the observations are conditionally independent given a (latent) distribution . This is a random variable itself and has a distribution. This distribution (over distributions) is called a Dirichlet process (). In summary, this means that we get an equivalent procedure to the above algorithm:

  1. Draw a distribution from
  2. Draw observations independently from .

In practice, however, drawing a concrete distribution is impossible, since its specification requires an infinite amount of information. This is a common phenomenon in the context of Bayesian non-parametric statistics where a typical task is to learn distributions on function spaces, which involve effectively infinitely many parameters. The key insight is that in many applications the infinite-dimensional distributions appear only as an intermediary computational device and are not required for either the initial specification of prior beliefs or for the statement of the final inference.

Formal definition

[edit]

Given a measurable set S, a base probability distribution H and a positive real number , the Dirichlet process is a stochastic process whose sample path (or realization, i.e. an infinite sequence of random variates drawn from the process) is a probability distribution over S, such that the following holds. For any measurable finite partition of S, denoted ,

where denotes the Dirichlet distribution and the notation means that the random variable has the distribution .

Alternative views

[edit]

There are several equivalent views of the Dirichlet process. Besides the formal definition above, the Dirichlet process can be defined implicitly through de Finetti's theorem as described in the first section; this is often called the Chinese restaurant process. A third alternative is the stick-breaking process, which defines the Dirichlet process constructively by writing a distribution sampled from the process as , where are samples from the base distribution , is an indicator function centered on (zero everywhere except for ) and the are defined by a recursive scheme that repeatedly samples from the beta distribution .

The Chinese restaurant process

[edit]
Animation of a Chinese restaurant process with scaling parameter . Tables are hidden once the customers of a table can not be displayed anymore; however, every table has infinitely many seats. (Recording of an interactive animation.[3])

A widely employed metaphor for the Dirichlet process is based on the so-called Chinese restaurant process. The metaphor is as follows:

Imagine a Chinese restaurant in which customers enter. A new customer sits down at a table with a probability proportional to the number of customers already sitting there. Additionally, a customer opens a new table with a probability proportional to the scaling parameter . After infinitely many customers entered, one obtains a probability distribution over infinitely many tables to be chosen. This probability distribution over the tables is a random sample of the probabilities of observations drawn from a Dirichlet process with scaling parameter .

If one associates draws from the base measure with every table, the resulting distribution over the sample space is a random sample of a Dirichlet process. The Chinese restaurant process is related to the Pólya urn sampling scheme which yields samples from finite Dirichlet distributions.

Because customers sit at a table with a probability proportional to the number of customers already sitting at the table, two properties of the DP can be deduced:

  1. The Dirichlet process exhibits a self-reinforcing property: The more often a given value has been sampled in the past, the more likely it is to be sampled again.
  2. Even if is a distribution over an uncountable set, there is a nonzero probability that two samples will have exactly the same value because the probability mass will concentrate on a small number of tables.

The stick-breaking process

[edit]

A third approach to the Dirichlet process is the so-called stick-breaking process view. Conceptually, this involves repeatedly breaking off and discarding a random fraction (sampled from a Beta distribution) of a "stick" that is initially of length 1. Remember that draws from a Dirichlet process are distributions over a set . As noted previously, the distribution drawn is discrete with probability 1. In the stick-breaking process view, we explicitly use the discreteness and give the probability mass function of this (random) discrete distribution as:

where is the indicator function which evaluates to zero everywhere, except for . Since this distribution is random itself, its mass function is parameterized by two sets of random variables: the locations and the corresponding probabilities . In the following, we present without proof what these random variables are.

The locations are independent and identically distributed according to , the base distribution of the Dirichlet process. The probabilities are given by a procedure resembling the breaking of a unit-length stick (hence the name):

where are independent random variables with the beta distribution . The resemblance to 'stick-breaking' can be seen by considering as the length of a piece of a stick. We start with a unit-length stick and in each step we break off a portion of the remaining stick according to and assign this broken-off piece to . The formula can be understood by noting that after the first k ? 1 values have their portions assigned, the length of the remainder of the stick is and this piece is broken according to and gets assigned to .

The smaller is, the less of the stick will be left for subsequent values (on average), yielding more concentrated distributions.

The stick-breaking process is similar to the construction where one samples sequentially from marginal beta distributions in order to generate a sample from a Dirichlet distribution.[4]

The Pólya urn scheme

[edit]

Yet another way to visualize the Dirichlet process and Chinese restaurant process is as a modified Pólya urn scheme sometimes called the Blackwell–MacQueen sampling scheme. Imagine that we start with an urn filled with black balls. Then we proceed as follows:

  1. Each time we need an observation, we draw a ball from the urn.
  2. If the ball is black, we generate a new (non-black) colour uniformly, label a new ball this colour, drop the new ball into the urn along with the ball we drew, and return the colour we generated.
  3. Otherwise, label a new ball with the colour of the ball we drew, drop the new ball into the urn along with the ball we drew, and return the colour we observed.

The resulting distribution over colours is the same as the distribution over tables in the Chinese restaurant process. Furthermore, when we draw a black ball, if rather than generating a new colour, we instead pick a random value from a base distribution and use that value to label the new ball, the resulting distribution over labels will be the same as the distribution over the values in a Dirichlet process.

Use as a prior distribution

[edit]

The Dirichlet Process can be used as a prior distribution to estimate the probability distribution that generates the data. In this section, we consider the model

The Dirichlet Process distribution satisfies prior conjugacy, posterior consistency, and the Bernstein–von Mises theorem.[5]

Prior conjugacy

[edit]

In this model, the posterior distribution is again a Dirichlet process. This means that the Dirichlet process is a conjugate prior for this model. The posterior distribution is given by

where is defined below.

Posterior consistency

[edit]

If we take the frequentist view of probability, we believe there is a true probability distribution that generated the data. Then it turns out that the Dirichlet process is consistent in the weak topology, which means that for every weak neighbourhood of , the posterior probability of converges to .

Bernstein–Von Mises theorem

[edit]

In order to interpret the credible sets as confidence sets, a Bernstein–von Mises theorem is needed. In case of the Dirichlet process we compare the posterior distribution with the empirical process . Suppose is a -Donsker class, i.e.

for some Brownian Bridge . Suppose also that there exists a function such that such that , then, almost surely

This implies that credible sets you construct are asymptotic confidence sets, and the Bayesian inference based on the Dirichlet process is asymptotically also valid frequentist inference.

Use in Dirichlet mixture models

[edit]
Simulation of 1000 observations drawn from a Dirichlet mixture model. Each observation within a cluster is drawn independently from the multivariate normal distribution . The cluster means are drawn from a distribution G which itself is drawn from a Dirichlet process with concentration parameter and base distribution . Each row is a new simulation.

To understand what Dirichlet processes are and the problem they solve we consider the example of data clustering. It is a common situation that data points are assumed to be distributed in a hierarchical fashion where each data point belongs to a (randomly chosen) cluster and the members of a cluster are further distributed randomly within that cluster.

Example 1

[edit]

For example, we might be interested in how people will vote on a number of questions in an upcoming election. A reasonable model for this situation might be to classify each voter as a liberal, a conservative or a moderate and then model the event that a voter says "Yes" to any particular question as a Bernoulli random variable with the probability dependent on which political cluster they belong to. By looking at how votes were cast in previous years on similar pieces of legislation one could fit a predictive model using a simple clustering algorithm such as k-means. That algorithm, however, requires knowing in advance the number of clusters that generated the data. In many situations, it is not possible to determine this ahead of time, and even when we can reasonably assume a number of clusters we would still like to be able to check this assumption. For example, in the voting example above the division into liberal, conservative and moderate might not be finely tuned enough; attributes such as a religion, class or race could also be critical for modelling voter behaviour, resulting in more clusters in the model.

Example 2

[edit]

As another example, we might be interested in modelling the velocities of galaxies using a simple model assuming that the velocities are clustered, for instance by assuming each velocity is distributed according to the normal distribution , where the th observation belongs to the th cluster of galaxies with common expected velocity. In this case it is far from obvious how to determine a priori how many clusters (of common velocities) there should be and any model for this would be highly suspect and should be checked against the data. By using a Dirichlet process prior for the distribution of cluster means we circumvent the need to explicitly specify ahead of time how many clusters there are, although the concentration parameter still controls it implicitly.

We consider this example in more detail. A first naive model is to presuppose that there are clusters of normally distributed velocities with common known fixed variance . Denoting the event that the th observation is in the th cluster as we can write this model as:

That is, we assume that the data belongs to distinct clusters with means and that is the (unknown) prior probability of a data point belonging to the th cluster. We assume that we have no initial information distinguishing the clusters, which is captured by the symmetric prior . Here denotes the Dirichlet distribution and denotes a vector of length where each element is 1. We further assign independent and identical prior distributions to each of the cluster means, where may be any parametric distribution with parameters denoted as . The hyper-parameters and are taken to be known fixed constants, chosen to reflect our prior beliefs about the system. To understand the connection to Dirichlet process priors we rewrite this model in an equivalent but more suggestive form:

Instead of imagining that each data point is first assigned a cluster and then drawn from the distribution associated to that cluster we now think of each observation being associated with parameter drawn from some discrete distribution with support on the means. That is, we are now treating the as being drawn from the random distribution and our prior information is incorporated into the model by the distribution over distributions .

Animation of the clustering process for one-dimensional data using Gaussian distributions drawn from a Dirichlet process. The histograms of the clusters are shown in different colours. During the parameter estimation process, new clusters are created and grow on the data. The legend shows the cluster colours and the number of datapoints assigned to each cluster.

We would now like to extend this model to work without pre-specifying a fixed number of clusters . Mathematically, this means we would like to select a random prior distribution where the values of the clusters means are again independently distributed according to and the distribution over is symmetric over the infinite set of clusters. This is exactly what is accomplished by the model:

With this in hand we can better understand the computational merits of the Dirichlet process. Suppose that we wanted to draw observations from the naive model with exactly clusters. A simple algorithm for doing this would be to draw values of from , a distribution from and then for each observation independently sample the cluster with probability and the value of the observation according to . It is easy to see that this algorithm does not work in case where we allow infinite clusters because this would require sampling an infinite dimensional parameter . However, it is still possible to sample observations . One can e.g. use the Chinese restaurant representation described below and calculate the probability for used clusters and a new cluster to be created. This avoids having to explicitly specify . Other solutions are based on a truncation of clusters: A (high) upper bound to the true number of clusters is introduced and cluster numbers higher than the lower bound are treated as one cluster.

Fitting the model described above based on observed data means finding the posterior distribution over cluster probabilities and their associated means. In the infinite dimensional case it is obviously impossible to write down the posterior explicitly. It is, however, possible to draw samples from this posterior using a modified Gibbs sampler.[6] This is the critical fact that makes the Dirichlet process prior useful for inference.

Applications of the Dirichlet process

[edit]

Dirichlet processes are frequently used in Bayesian nonparametric statistics. "Nonparametric" here does not mean a parameter-less model, rather a model in which representations grow as more data are observed. Bayesian nonparametric models have gained considerable popularity in the field of machine learning because of the above-mentioned flexibility, especially in unsupervised learning. In a Bayesian nonparametric model, the prior and posterior distributions are not parametric distributions, but stochastic processes.[7] The fact that the Dirichlet distribution is a probability distribution on the simplex of sets of non-negative numbers that sum to one makes it a good candidate to model distributions over distributions or distributions over functions. Additionally, the nonparametric nature of this model makes it an ideal candidate for clustering problems where the distinct number of clusters is unknown beforehand. In addition, the Dirichlet process has also been used for developing a mixture of expert models, in the context of supervised learning algorithms (regression or classification settings). For instance, mixtures of Gaussian process experts, where the number of required experts must be inferred from the data.[8][9]

As draws from a Dirichlet process are discrete, an important use is as a prior probability in infinite mixture models. In this case, is the parametric set of component distributions. The generative process is therefore that a sample is drawn from a Dirichlet process, and for each data point, in turn, a value is drawn from this sample distribution and used as the component distribution for that data point. The fact that there is no limit to the number of distinct components which may be generated makes this kind of model appropriate for the case when the number of mixture components is not well-defined in advance. For example, the infinite mixture of Gaussians model,[10] as well as associated mixture regression models, e.g.[11]

The infinite nature of these models also lends them to natural language processing applications, where it is often desirable to treat the vocabulary as an infinite, discrete set.

The Dirichlet Process can also be used for nonparametric hypothesis testing, i.e. to develop Bayesian nonparametric versions of the classical nonparametric hypothesis tests, e.g. sign test, Wilcoxon rank-sum test, Wilcoxon signed-rank test, etc. For instance, Bayesian nonparametric versions of the Wilcoxon rank-sum test and the Wilcoxon signed-rank test have been developed by using the imprecise Dirichlet process, a prior ignorance Dirichlet process. [citation needed]

[edit]

References

[edit]
  1. ^ Frigyik, Bela A.; Kapila, Amol; Gupta, Maya R. "Introduction to the Dirichlet Distribution and Related Processes" (PDF). Retrieved 2 September 2021.
  2. ^ Ferguson, Thomas (1973). "Bayesian analysis of some nonparametric problems". Annals of Statistics. 1 (2): 209–230. doi:10.1214/aos/1176342360. MR 0350949.
  3. ^ "Dirichlet Process and Dirichlet Distribution – Polya Restaurant Scheme and Chinese Restaurant Process".
  4. ^ For the proof, see Paisley, John (August 2010). "A simple proof of the stick-breaking construction of the Dirichlet Process" (PDF). Columbia University. Archived from the original (PDF) on January 22, 2015.
  5. ^ Aad van der Vaart, Subhashis Ghosal (2017). Fundamentals of Bayesian Nonparametric Inference. Cambridge University Press. ISBN 978-0-521-87826-5.
  6. ^ Sudderth, Erik (2006). Graphical Models for Visual Object Recognition and Tracking (PDF) (Ph.D.). MIT Press.
  7. ^ Nils Lid Hjort; Chris Holmes, Peter Müller; Stephen G. Walker (2010). Bayesian Nonparametrics. Cambridge University Press. ISBN 978-0-521-51346-3.
  8. ^ Sotirios P. Chatzis, "A Latent Variable Gaussian Process Model with Pitman-Yor Process Priors for Multiclass Classification," Neurocomputing, vol. 120, pp. 482–489, Nov. 2013. doi:10.1016/j.neucom.2013.04.029
  9. ^ Sotirios P. Chatzis, Yiannis Demiris, "Nonparametric mixtures of Gaussian processes with power-law behaviour," IEEE Transactions on Neural Networks and Learning Systems, vol. 23, no. 12, pp. 1862–1871, Dec. 2012. doi:10.1109/TNNLS.2012.2217986
  10. ^ Rasmussen, Carl (2000). "The Infinite Gaussian Mixture Model" (PDF). Advances in Neural Information Processing Systems. 12: 554–560.
  11. ^ Sotirios P. Chatzis, Dimitrios Korkinof, and Yiannis Demiris, "A nonparametric Bayesian approach toward robot learning by demonstration," Robotics and Autonomous Systems, vol. 60, no. 6, pp. 789–802, June 2012. doi:10.1016/j.robot.2012.02.005
[edit]
宋美龄为什么没有孩子 膀胱炎挂什么科 河蟹吃什么 肾积水吃什么药最好 女人最大的底气是什么
2222是什么意思 卵巢早衰是什么引起的 淋病是什么病 降火吃什么药 nov是什么意思
婆婆是什么意思 月亮五行属什么 属鸡的适合干什么行业最赚钱 清秀是什么意思 眼底充血用什么眼药水
食道炎吃什么药好 万年历是什么 什么是事业 慵懒是什么意思 为什么阴道会放气
身份证后4位代表什么hcv9jop0ns4r.cn 丹参滴丸和丹参片有什么区别96micro.com 鲁字五行属什么hcv7jop7ns0r.cn 高血压头晕吃什么药hcv8jop8ns6r.cn 垂头丧气是什么意思hcv7jop6ns1r.cn
十月二十六是什么星座hcv8jop1ns5r.cn force是什么牌子hcv8jop2ns9r.cn 胃酸烧心吃什么药hcv8jop7ns3r.cn 什么是家hcv9jop7ns2r.cn 牛郎叫什么名字hcv7jop9ns6r.cn
什么动作容易怀孕hcv7jop9ns6r.cn 脾虚可以吃什么水果hcv8jop2ns1r.cn 什么是钼靶检查zhiyanzhang.com 给男人补身体煲什么汤hcv7jop7ns4r.cn 孕妇梦见好多蛇是什么预兆hcv8jop4ns7r.cn
真知灼见什么意思hcv9jop2ns2r.cn 鸟字旁的字和什么有关hcv8jop4ns8r.cn 神经病和精神病有什么区别hcv8jop1ns2r.cn 门槛什么意思hcv9jop3ns3r.cn gypsophila什么意思hcv8jop6ns6r.cn
百度