π ΠΠ±ΡΡΠ΅Π½ΠΈΠ΅ Ρ ΠΏΠΎΠ΄ΠΊΡΠ΅ΠΏΠ»Π΅Π½ΠΈΠ΅ΠΌ Π΄Π»Ρ ΡΠ΅Π°Π»ΡΠ½ΡΡ Π·Π°Π΄Π°Ρ. ΠΠ½ΠΆΠ΅Π½Π΅ΡΠ½ΡΠΉ ΠΏΠΎΠ΄Ρ ΠΎΠ΄ [2023] Π€ΠΈΠ» Π£ΠΈΠ½Π΄Π΅Ρ
ΠΠ½ΠΈΠ³Π° ΠΏΠΎΡΠ²ΡΡΠ΅Π½Π° ΠΏΡΠΎΠΌΡΡΠ»Π΅Π½Π½ΠΎ-ΠΎΡΠΈΠ΅Π½ΡΠΈΡΠΎΠ²Π°Π½Π½ΠΎΠΌΡ ΠΏΡΠΈΠΌΠ΅Π½Π΅Π½ΠΈΡ ΠΎΠ±ΡΡΠ΅Π½ΠΈΡ Ρ ΠΏΠΎΠ΄ΠΊΡΠ΅ΠΏΠ»Π΅Π½ΠΈΠ΅ΠΌ (Reinforcement Learning, RL). ΠΠ±ΡΡΡΠ½Π΅Π½ΠΎ, ΠΊΠ°ΠΊ ΠΎΠ±ΡΡΠ°ΡΡ ΠΏΡΠΎΠΌΡΡΠ»Π΅Π½Π½ΡΠ΅ ΠΈ Π½Π°ΡΡΠ½ΡΠ΅ ΡΠΈΡΡΠ΅ΠΌΡ ΡΠ΅ΡΠ΅Π½ΠΈΡ Π»ΡΠ±ΡΡ ΠΏΠΎΡΠ°Π³ΠΎΠ²ΡΡ Π·Π°Π΄Π°Ρ ΠΌΠ΅ΡΠΎΠ΄ΠΎΠΌ ΠΏΡΠΎΠ± ΠΈ ΠΎΡΠΈΠ±ΠΎΠΊ β Π±Π΅Π· ΠΏΠΎΠ΄Π³ΠΎΡΠΎΠ²ΠΊΠΈ ΡΠ·ΠΊΠΎΡΠΏΠ΅ΡΠΈΠ°Π»ΠΈΠ·ΠΈΡΠΎΠ²Π°Π½Π½ΡΡ ΡΡΠ΅Π±Π½ΡΡ ΠΌΠ½ΠΎΠΆΠ΅ΡΡΠ² Π΄Π°Π½Π½ΡΡ ΠΈ Π±Π΅Π· ΡΠΈΡΠΊΠ° ΠΏΠ΅ΡΠ΅ΠΎΠ±ΡΡΠΈΡΡ ΠΈΠ»ΠΈ ΠΏΠ΅ΡΠ΅ΡΡΠ»ΠΎΠΆΠ½ΠΈΡΡ Π°Π»Π³ΠΎΡΠΈΡΠΌ. Π Π°ΡΡΠΌΠΎΡΡΠ΅Π½Ρ ΠΌΠ°ΡΠΊΠΎΠ²ΡΠΊΠΈΠ΅ ΠΏΡΠΎΡΠ΅ΡΡΡ ΠΏΡΠΈΠ½ΡΡΠΈΡ ΡΠ΅ΡΠ΅Π½ΠΈΠΉ, Π³Π»ΡΠ±ΠΎΠΊΠΈΠ΅ Q-ΡΠ΅ΡΠΈ, Π³ΡΠ°Π΄ΠΈΠ΅Π½ΡΡ ΠΏΠΎΠ»ΠΈΡΠΈΠΊ ΠΈ ΠΈΡ Π²ΡΡΠΈΡΠ»Π΅Π½ΠΈΠ΅, ΠΌΠ΅ΡΠΎΠ΄Ρ ΡΡΡΡΠ°Π½Π΅Π½ΠΈΡ ΡΠ½ΡΡΠΎΠΏΠΈΠΈ ΠΈ ΠΌΠ½ΠΎΠ³ΠΎΠ΅ Π΄ΡΡΠ³ΠΎΠ΅. ΠΠ°Π½Π½Π°Ρ ΠΊΠ½ΠΈΠ³Π° β ΠΏΠ΅ΡΠ²Π°Ρ Π½Π° ΡΡΡΡΠΊΠΎΠΌ ΡΠ·ΡΠΊΠ΅, Π³Π΄Π΅ ΡΠ΅ΠΎΡΠ΅ΡΠΈΡΠ΅ΡΠΊΠΈΠΉ Π±Π°Π·ΠΈΡ RL ΠΈ Π°Π»Π³ΠΎΡΠΈΡΠΌΡ Π΄Π°Π½Ρ Π² ΠΏΡΠΈΠΊΠ»Π°Π΄Π½ΠΎΠΌ, ΠΎΡΡΠ°ΡΠ»Π΅Π²ΠΎΠΌ ΠΊΠ»ΡΡΠ΅. ΠΠ»Ρ Π°Π½Π°Π»ΠΈΡΠΈΠΊΠΎΠ² Π΄Π°Π½Π½ΡΡ ΠΈ ΡΠΏΠ΅ΡΠΈΠ°Π»ΠΈΡΡΠΎΠ² ΠΏΠΎ ΠΈΡΠΊΡΡΡΡΠ²Π΅Π½Π½ΠΎΠΌΡ ΠΈΠ½ΡΠ΅Π»Π»Π΅ΠΊΡΡ.
π Reinforcement Learning: Industrial Applications of Intelligent Agents [2021] Phil Winder, Ph.D.
Reinforcement learning (RL) is a machine learning (ML) paradigm that is capable of optimizing sequential decisions. RL is interesting because it mimics how we, as humans, learn. We are instinctively capable of learning strategies that help us master complex tasks like riding a bike or taking a mathematics exam. RL attempts to copy this process by interacting with the environment to learn strategies. Recently, businesses have been applying ML algorithms to make one-shot decisions. These are trained upon data to make the best decision at the time. But often, the right decision at the time may not be the best decision in the long term. Yes, that full tub of ice cream will make you happy in the short term, but youβll have to do more exercise next week. Similarly, click-bait recommendations might have the highest click-through rates, but in the long term these articles feel like a scam and hurt long-term engagement or retention. RL is exciting because it is possible to learn long-term strategies and apply them to complex industrial problems.