<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Policy Gradient on alex的算法技术日志</title>
    <link>https://blog.alex-tech.org/tags/Policy-Gradient/</link>
    <description>Recent content in Policy Gradient on alex的算法技术日志</description>
    <generator>Hugo</generator>
    <language>zh-cn</language>
    <copyright>alex. 原创内容默认采用 CC BY-NC 4.0 授权；转载请注明出处并附链接。</copyright>
    <lastBuildDate>Sun, 29 Mar 2026 13:43:32 +0800</lastBuildDate>
    <atom:link href="https://blog.alex-tech.org/tags/Policy-Gradient/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>强化学习入门：Spinning Up 最简单策略梯度（REINFORCE）训练循环与 Loss 解读</title>
      <link>https://blog.alex-tech.org/posts/DeepReinforcementLearning/</link>
      <pubDate>Sun, 29 Mar 2026 13:43:32 +0800</pubDate>
      <guid>https://blog.alex-tech.org/posts/DeepReinforcementLearning/</guid>
      <description>整理 Spinning Up simplest policy gradient 文中训练循环（episode / epoch）与 loss、权重的对应关系，并串起示例代码要点。</description>
    </item>
  </channel>
</rss>
