ОблоТка канала

Physics.Math.Code

60543 @physics_lib

БообщСство Ρ„ΠΈΠ·ΠΈΠΊΠΎΠ², ΠΌΠ°Ρ‚Π΅ΠΌΠ°Ρ‚ΠΈΠΊΠΎΠ² ΠΈ Ρ€Π°Π·Ρ€Π°Π±ΠΎΡ‚Ρ‡ΠΈΠΊΠΎΠ². Книги, Π²ΠΈΠ΄Π΅ΠΎΡƒΡ€ΠΎΠΊΠΈ, ΡΡ‚Π°Ρ‚ΡŒΠΈ.

Physics.Math.Code

4 Π³ΠΎΠ΄Π° Π½Π°Π·Π°Π΄
ΠžΡ‚ΠΊΡ€Ρ‹Ρ‚ΡŒ Π²
πŸ“• ΠžΠ±ΡƒΡ‡Π΅Π½ΠΈΠ΅ с ΠΏΠΎΠ΄ΠΊΡ€Π΅ΠΏΠ»Π΅Π½ΠΈΠ΅ΠΌ для Ρ€Π΅Π°Π»ΡŒΠ½Ρ‹Ρ… Π·Π°Π΄Π°Ρ‡. Π˜Π½ΠΆΠ΅Π½Π΅Ρ€Π½Ρ‹ΠΉ ΠΏΠΎΠ΄Ρ…ΠΎΠ΄ [2023] Π€ΠΈΠ» Π£ΠΈΠ½Π΄Π΅Ρ€ Книга посвящСна ΠΏΡ€ΠΎΠΌΡ‹ΡˆΠ»Π΅Π½Π½ΠΎ-ΠΎΡ€ΠΈΠ΅Π½Ρ‚ΠΈΡ€ΠΎΠ²Π°Π½Π½ΠΎΠΌΡƒ ΠΏΡ€ΠΈΠΌΠ΅Π½Π΅Π½ΠΈΡŽ обучСния с ΠΏΠΎΠ΄ΠΊΡ€Π΅ΠΏΠ»Π΅Π½ΠΈΠ΅ΠΌ (Reinforcement Learning, RL). ОбъяснСно, ΠΊΠ°ΠΊ ΠΎΠ±ΡƒΡ‡Π°Ρ‚ΡŒ ΠΏΡ€ΠΎΠΌΡ‹ΡˆΠ»Π΅Π½Π½Ρ‹Π΅ ΠΈ Π½Π°ΡƒΡ‡Π½Ρ‹Π΅ систСмы Ρ€Π΅ΡˆΠ΅Π½ΠΈΡŽ Π»ΡŽΠ±Ρ‹Ρ… ΠΏΠΎΡˆΠ°Π³ΠΎΠ²Ρ‹Ρ… Π·Π°Π΄Π°Ρ‡ ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΎΠΌ ΠΏΡ€ΠΎΠ± ΠΈ ошибок – Π±Π΅Π· ΠΏΠΎΠ΄Π³ΠΎΡ‚ΠΎΠ²ΠΊΠΈ узкоспСциализированных ΡƒΡ‡Π΅Π±Π½Ρ‹Ρ… мноТСств Π΄Π°Π½Π½Ρ‹Ρ… ΠΈ Π±Π΅Π· риска ΠΏΠ΅Ρ€Π΅ΠΎΠ±ΡƒΡ‡ΠΈΡ‚ΡŒ ΠΈΠ»ΠΈ ΠΏΠ΅Ρ€Π΅ΡƒΡΠ»ΠΎΠΆΠ½ΠΈΡ‚ΡŒ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌ. РассмотрСны марковскиС процСссы принятия Ρ€Π΅ΡˆΠ΅Π½ΠΈΠΉ, Π³Π»ΡƒΠ±ΠΎΠΊΠΈΠ΅ Q-сСти, Π³Ρ€Π°Π΄ΠΈΠ΅Π½Ρ‚Ρ‹ ΠΏΠΎΠ»ΠΈΡ‚ΠΈΠΊ ΠΈ ΠΈΡ… вычислСниС, ΠΌΠ΅Ρ‚ΠΎΠ΄Ρ‹ устранСния энтропии ΠΈ ΠΌΠ½ΠΎΠ³ΠΎΠ΅ Π΄Ρ€ΡƒΠ³ΠΎΠ΅. Данная ΠΊΠ½ΠΈΠ³Π° – пСрвая Π½Π° русском языкС, Π³Π΄Π΅ тСорСтичСский базис RL ΠΈ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΡ‹ Π΄Π°Π½Ρ‹ Π² ΠΏΡ€ΠΈΠΊΠ»Π°Π΄Π½ΠΎΠΌ, отраслСвом ΠΊΠ»ΡŽΡ‡Π΅. Для Π°Π½Π°Π»ΠΈΡ‚ΠΈΠΊΠΎΠ² Π΄Π°Π½Π½Ρ‹Ρ… ΠΈ спСциалистов ΠΏΠΎ искусствСнному ΠΈΠ½Ρ‚Π΅Π»Π»Π΅ΠΊΡ‚Ρƒ. πŸ“˜ Reinforcement Learning: Industrial Applications of Intelligent Agents [2021] Phil Winder, Ph.D. Reinforcement learning (RL) is a machine learning (ML) paradigm that is capable of optimizing sequential decisions. RL is interesting because it mimics how we, as humans, learn. We are instinctively capable of learning strategies that help us master complex tasks like riding a bike or taking a mathematics exam. RL attempts to copy this process by interacting with the environment to learn strategies. Recently, businesses have been applying ML algorithms to make one-shot decisions. These are trained upon data to make the best decision at the time. But often, the right decision at the time may not be the best decision in the long term. Yes, that full tub of ice cream will make you happy in the short term, but you’ll have to do more exercise next week. Similarly, click-bait recommendations might have the highest click-through rates, but in the long term these articles feel like a scam and hurt long-term engagement or retention. RL is exciting because it is possible to learn long-term strategies and apply them to complex industrial problems.