علوم اعصاب شناختی؛ کنترل شناختی؛ تصمیم گیری؛ فعالیت دوپامین و ارزش پاداش

داریوش طاهری| به‌روزرسانی: ۱۲ بهمن ۱۴۰۳

خواندن این مطلب 1 ساعت زمان میبرد

ترجمه کتاب علوم اعصاب شناختی میکائیل گازانیگا؛ فصل کنترل شناختی

راهنمای مطالعه نمایش

دعای مطالعه _{[ نمایش ]}

بِسْمِ الله الرَّحْمنِ الرَّحیمِ

اَللّهُمَّ اَخْرِجْنى مِنْ ظُلُماتِ الْوَهْمِ

خدایا مرا بیرون آور از تاریکى‏‌هاى‏ وهم،

وَ اَکْرِمْنى بِنُورِ الْفَهْمِ

و به نور فهم گرامى ‏ام بدار،

اَللّهُمَّ افْتَحْ عَلَیْنا اَبْوابَ رَحْمَتِکَ

خدایا درهاى رحمتت را به روى ما بگشا،

وَانْشُرْ عَلَیْنا خَزائِنَ عُلُومِکَ بِرَحْمَتِکَ یا اَرْحَمَ الرّاحِمینَ

و خزانه‏‌هاى علومت را بر ما باز کن به امید رحمتت اى مهربان‌‏ترین مهربانان.

» کتاب علوم اعصاب شناختی گازانیگا

» Cognitive Neuroscience: The Biology of the Mind

»» فصل ۱۲: فصل کنترل شناختی؛ قسمت سوم

»» CHAPTER 12: cognitive control; part three

در حال ویرایش

۱۲.۴ Decision Making

۱۲.۴ تصمیم گیری

Go back to the hot summer day when you thought, “Hmm… that frosty, cold drink is worth looking for. I’m going to get one.” That type of goal-oriented behavior begins with a decision to pursue the goal. We might think of the brain as a decision-making device in which our perceptual and memory systems evolved to support decisions that determine our actions. Our brains start making decisions as soon as our eyes flutter open in the morning: Do I get up now or snooze a bit longer? Do I wear shorts or jeans? Do I skip class to study for an exam? Though humans tend to focus on complex decisions such as who will get their vote in the next election, all animals need to make decisions. Even an earthworm decides when to leave a patch of lawn and move on to greener pastures.

به روز گرم تابستانی برگردید که فکر کردید: “هوم… آن نوشیدنی سرد و سرد ارزش جستجو کردن را دارد. من می‌روم.” این نوع رفتار هدف گرا با تصمیم برای تعقیب هدف آغاز می‌شود. ممکن است مغز را دستگاه تصمیم گیری بدانیم که در آن سیستم‌های ادراکی و حافظه ما برای حمایت از تصمیماتی که اعمال ما را تعیین می‌کند تکامل یافته است. مغز ما به محض باز شدن چشمانمان در صبح شروع به تصمیم گیری می‌کند: آیا الان بیدار می‌شوم یا کمی‌بیشتر چرت می‌زنم؟ شلوارک بپوشم یا جین؟ آیا برای مطالعه برای امتحان از کلاس رد می‌شوم؟ اگرچه انسان‌ها تمایل دارند روی تصمیمات پیچیده تمرکز کنند، مانند اینکه چه کسی در انتخابات بعدی رای خود را به دست می‌آورد، همه حیوانات باید تصمیم بگیرند. حتی یک کرم خاکی هم تصمیم می‌گیرد که چه زمانی یک تکه چمن را ترک کند و به مراتع سرسبز برود.

Rational observers, such as economists and mathematicians, tend to be puzzled when they consider human behavior. To them, our behavior frequently appears in- consistent or irrational, not based on what seems to be a sensible evaluation of the circumstances and options. For instance, why would someone who is concerned about eating healthy food consume a jelly doughnut? Why would someone who is paying so much money for tuition skip classes? And why are people willing to spend large sums of money to insure themselves against low-risk events (e.g., buying fire insurance even though the odds are overwhelmingly small that they will ever use it), yet equally willing to engage in high-risk behaviors (e.g., texting while driving)?

ناظران منطقی، مانند اقتصاددانان و ریاضیدانان، وقتی رفتار انسان را در نظر می‌گیرند، تمایل دارند که متحیر شوند. از نظر آنها، رفتار ما اغلب ناسازگار یا غیرمنطقی به نظر می‌رسد، نه بر اساس آنچه به نظر می‌رسد ارزیابی معقولی از شرایط و گزینه‌ها باشد. به عنوان مثال، چرا کسی که نگران خوردن غذای سالم است، یک دونات ژله ای مصرف کند؟ چرا کسی که پول زیادی برای شهریه می‌پردازد کلاس‌ها را رها می‌کند؟ و چرا مردم حاضرند مبالغ هنگفتی را خرج کنند تا خود را در برابر حوادث کم خطر بیمه کنند (مثلاً خرید بیمه آتش سوزی حتی با وجود اینکه احتمال استفاده از آن بسیار کم است)، اما به همان اندازه مایل به انجام رفتارهای پرخطر هستند. (به عنوان مثال، پیامک در هنگام رانندگی)؟

The field of neuroeconomics has emerged as an inter- disciplinary enterprise with the goal of explaining the neural mechanisms underlying decision making. Economists want to understand how and why we make the choices we do. Many of their ideas can be tested both with behavioral studies and, as in all of cognitive neuroscience, with data from cellular activity, neuroimaging, or lesion studies. This work also helps us understand the functional organization of the brain.

حوزه اقتصاد عصبی به عنوان یک شرکت بین رشته ای با هدف توضیح مکانیسم‌های عصبی اساسی در تصمیم گیری ظهور کرده است. اقتصاددانان می‌خواهند بدانند که چگونه و چرا ما انتخاب‌هایی را انجام می‌دهیم. بسیاری از ایده‌های آنها را می‌توان هم با مطالعات رفتاری و هم مانند سایر علوم اعصاب شناختی، با داده‌های مربوط به فعالیت سلولی، تصویربرداری عصبی، یا مطالعات ضایعه آزمایش کرد. این کار همچنین به ما کمک می‌کند تا سازمان عملکردی مغز را درک کنیم.

Theories about our decision-making processes are either normative or descriptive. Normative decision theories define how people ought to make decisions that yield the optimal choice. Very often, however, such theories fail to predict what people actually choose. Descriptive decision theories attempt to describe what people actually do, not what they should do.

نظریه‌های مربوط به فرآیندهای تصمیم گیری ما یا هنجاری یا توصیفی هستند. تئوری‌های تصمیم گیری هنجاری تعریف می‌کنند که مردم چگونه باید تصمیماتی بگیرند که انتخاب بهینه را به همراه داشته باشد. با این حال، اغلب اوقات، چنین نظریه‌هایی در پیش‌بینی آنچه که مردم واقعاً انتخاب می‌کنند شکست می‌خورد. تئوری‌های تصمیم گیری توصیفی تلاش می‌کنند تا آنچه را که مردم واقعا انجام می‌دهند، توصیف کنند، نه آنچه را که باید انجام دهند.

Our inconsistent, sometimes suboptimal choices present less of a mystery to evolutionary psychologists. Our modular brain has been sculpted by evolution to optimize reproduction and survival in a world that differed quite a bit from the one we currently occupy. In that world, you would never have passed up the easy pickings of a jelly doughnut, something that is sweet and full of fat, or engaged in exercise solely for the sake of burning off valuable fat stores; conserving energy would have been a much more powerful factor. Our current brains reflect this past, drawing on the mechanisms that were essential for survival in a world before readily available food.

انتخاب‌های ناسازگار و گاه نامطلوب ما برای روان‌شناسان تکاملی کمتر معمایی است. مغز ماژولار ما توسط تکامل برای بهینه سازی تولید مثل و بقا در دنیایی ساخته شده است که با دنیایی که در حال حاضر در آن زندگی می‌کنیم بسیار متفاوت است. در آن دنیا، هرگز از چیدن آسان دونات ژله ای، چیزی که شیرین و پر از چربی است، دست نمی‌کشید، یا صرفاً به خاطر سوزاندن ذخایر ارزشمند چربی، ورزش می‌کردید. صرفه جویی در انرژی می‌توانست عامل بسیار قوی تری باشد. مغز کنونی ما منعکس کننده این گذشته است و از مکانیسم‌هایی استفاده می‌کند که برای بقا در دنیایی قبل از غذای در دسترس ضروری است.

Many of these mechanisms, as with all brain functions, putter along below our consciousness. We are unaware that we often make decisions by following simple, efficient rules (heuristics) that were sculpted and hard-coded by evolution. The results of these decisions may not seem rational, at least within the context of our current, highly mechanized world. But they may seem more rational if looked at from an evolutionary perspective.

بسیاری از این مکانیسم‌ها، مانند تمام عملکردهای مغز، زیر سطح آگاهی ما قرار می‌گیرند. ما غافلیم که غالباً با پیروی از قوانین ساده و کارآمد (اکتشافی) تصمیم می‌گیریم که توسط تکامل مجسمه سازی و کدگذاری شده اند. نتایج این تصمیمات حداقل در چارچوب دنیای کنونی و به شدت مکانیزه ما ممکن است منطقی به نظر نرسد. اما اگر از منظر تکاملی به آنها نگاه کنیم، ممکن است منطقی تر به نظر برسند.

Consistent with this point of view, the evidence indicates that we reach decisions in many different ways. As we touched on earlier, decisions can be goal oriented or habitual. The distinction is that goal-oriented decisions are based on the assessment of expected reward, whereas habits, by definition, are actions taken that are no longer under the control of reward: We simply execute them because the context triggers the action. A some- what similar way of classifying decisions is to divide them into action-outcome decisions or stimulus-response decisions. With an action-outcome decision, the decision involves some form of evaluation (not necessarily conscious) of the expected outcomes. After we repeat that action, and if the outcome is consistent, the process becomes habitual; that is, it becomes a stimulus- response decision.

مطابق با این دیدگاه، شواهد نشان می‌دهد که ما به روش‌های مختلف به تصمیم گیری می‌رسیم. همانطور که قبلاً به آن اشاره کردیم، تصمیمات می‌توانند هدف گرا یا عادتی باشند. تمایز این است که تصمیم‌های هدف‌محور مبتنی بر ارزیابی پاداش مورد انتظار است، در حالی که عادت‌ها، بنا به تعریف، اقداماتی هستند که دیگر تحت کنترل پاداش نیستند: ما آنها را به سادگی اجرا می‌کنیم، زیرا زمینه، اقدام را آغاز می‌کند. یک روش مشابه برای طبقه بندی تصمیمات، تقسیم آنها به تصمیمات عمل-نتیجه یا تصمیمات محرک-پاسخ است. با یک تصمیم عمل-نتیجه، تصمیم شامل نوعی ارزیابی (نه لزوما آگاهانه) از نتایج مورد انتظار است. پس از اینکه آن عمل را تکرار کردیم، و اگر نتیجه ثابت بود، روند عادی می‌شود. یعنی به یک تصمیم محرک – پاسخ تبدیل می‌شود.

Another distinction can be made between decisions that are model-free or model-based. Model-based means that the agent has an internal representation of some aspect of the world and uses this model to evaluate different actions. For example, a cognitive map would be a model of the spatial layout of the world, enabling you to choose an alternative path if you found the road blocked as you set off for the A&W restaurant. Model-free means that you have only an input-output mapping, similar to stimulus-response decisions. Here you know that to get to the A&W, you simply look for the tall tower at the center of town, which is right next to the A&W.

تمایز دیگری را می‌توان بین تصمیمات بدون مدل یا مبتنی بر مدل قائل شد. مبتنی بر مدل به این معنی است که عامل یک نمایش داخلی از جنبه‌ای از جهان دارد و از این مدل برای ارزیابی اقدامات مختلف استفاده می‌کند. به عنوان مثال، یک نقشه شناختی مدلی از چیدمان فضایی جهان خواهد بود، که به شما امکان می‌دهد در صورتی که هنگام حرکت به سمت رستوران A&W جاده مسدود شده بود، مسیر جایگزینی را انتخاب کنید. بدون مدل به این معنی است که شما فقط یک نگاشت ورودی-خروجی، مشابه تصمیمات محرک-پاسخ دارید. در اینجا می‌دانید که برای رسیدن به A&W، به سادگی به دنبال برج بلند در مرکز شهر هستید که درست در کنار A&W قرار دارد.

Decisions that involve other people are known as social decisions. Dealing with other individuals tends to make things much more complicated-a topic we will return to in Chapters 13 and 14.

تصمیماتی که افراد دیگر را درگیر می‌کند به عنوان تصمیمات اجتماعی شناخته می‌شوند. برخورد با افراد دیگر باعث می‌شود همه چیز بسیار پیچیده تر شود – موضوعی که در فصل‌های ۱۳ و ۱۴ به آن باز خواهیم گشت.

Is It Worth It? Value and Decision Making

آیا ارزشش را دارد؟ ارزش و تصمیم گیری

A cornerstone idea in economic models of decision making is that before we make a decision, we first compute the value of each option and then compare the different values in some way (Padoa-Schioppa, 2011). Decision making in this framework is about making choices that will maximize value. For example, we want to obtain the highest possible reward or payoff (Figure 12.10). It is not enough, however, to think only about the possible reward level. We also have to consider the likelihood of receiving the reward, as well as the costs required to obtain that reward. Although many lottery players dream of winning the million-dollar prize, some may forgo a chance at the big money and buy a ticket with a maximum payoff of a hundred dollars, knowing that their odds of winning are much higher.

یک ایده اساسی در مدل‌های اقتصادی تصمیم‌گیری این است که قبل از تصمیم‌گیری، ابتدا ارزش هر گزینه را محاسبه کرده و سپس مقادیر مختلف را به نحوی با هم مقایسه می‌کنیم (Padoa-Schioppa، ۲۰۱۱). تصمیم گیری در این چارچوب مربوط به انتخاب‌هایی است که ارزش را به حداکثر می‌رساند. به عنوان مثال، ما می‌خواهیم بالاترین پاداش یا بازده ممکن را به دست آوریم (شکل ۱۲.۱۰). با این حال، تنها فکر کردن به سطح پاداش ممکن کافی نیست. همچنین باید احتمال دریافت پاداش و همچنین هزینه‌های لازم برای دریافت آن پاداش را در نظر بگیریم. اگرچه بسیاری از بازیکنان بخت آزمایی رویای بردن جایزه میلیون دلاری را در سر می‌پرورانند، برخی ممکن است از شانس کسب پول کلان صرف نظر کنند و بلیتی با حداکثر سود صد دلاری بخرند، زیرا می‌دانند که شانس برنده شدن آنها بسیار بیشتر است.

شکل 12.10 تصمیم گیری‌ها مستلزم ادغام و ارزیابی عوامل متعدد است

FIGURE 12.10 Decisions require the Integration and evaluation of multiple factors.
In this example, the person is asked to choose between two options, each of which has an inferred value. The values are determined by a weighted combination of multiple sources of information. Some sources are external to the agent- for example, will I gain (commodity), how much reward will be obtained (quantity), will I get the reward right away (delay), and how certain am I to obtain the reward (risk)? Other factors are internal to the agent-for example, am I feeling motivated (motivation), am I willing to wait for the reward (patience), is the risk worth it (risk attitude)?

شکل ۱۲.۱۰ تصمیم گیری‌ها مستلزم ادغام و ارزیابی عوامل متعدد است.
در این مثال از فرد خواسته می‌شود که بین دو گزینه که هر کدام دارای یک مقدار استنباط شده است یکی را انتخاب کند. مقادیر با ترکیب وزنی از چندین منبع اطلاعات تعیین می‌شوند. برخی منابع خارج از عامل هستند- برای مثال، آیا من به دست می‌آورم (کالا)، چقدر پاداش به دست می‌آید (مقدار)، آیا پاداش را فوراً دریافت می‌کنم (تاخیر)، و چقدر مطمئن هستم که پاداش را به دست می‌آورم (ریسک) )؟ عوامل دیگر درونی عامل هستند – به عنوان مثال، آیا من احساس انگیزه می‌کنم (انگیزه)، آیا حاضرم منتظر پاداش (صبر) باشم، آیا ریسک ارزش آن را دارد (نگرش ریسک)؟

COMPONENTS OF VALUE

اجزای ارزش

To figure out the neural processes involved in decision making, we need to understand how the brain computes value and processes rewards. Some rewards, such as food, water, or sex, are primary reinforcers: They have a direct benefit for survival fitness. Their value, or our response to these reinforcers, is, to some extent, hardwired in our genetic code. But reward value is also flexible and shaped by experience. If you are truly starving, an item of disgust say, a dead mouse-suddenly takes on reinforcing properties. Secondary reinforcers, such as money and status, are rewards that have no intrinsic value themselves, but become rewarding through their association with other forms of reinforcement.

برای پی بردن به فرآیندهای عصبی درگیر در تصمیم گیری، باید بدانیم که مغز چگونه ارزش‌ها را محاسبه می‌کند و پاداش‌ها را پردازش می‌کند. برخی از پاداش‌ها، مانند غذا، آب، یا رابطه جنسی، تقویت‌کننده‌های اصلی هستند: آنها برای تناسب اندام بقا سود مستقیم دارند. ارزش آنها، یا پاسخ ما به این تقویت کننده‌ها، تا حدی در کد ژنتیکی ما گنجانده شده است. اما ارزش پاداش نیز انعطاف پذیر است و با تجربه شکل می‌گیرد. اگر واقعاً از گرسنگی می‌میرید، می‌گویند یک مورد منزجرکننده، یک موش مرده ناگهان خاصیت تقویت‌کننده‌ای پیدا می‌کند. تقویت‌کننده‌های ثانویه، مانند پول و موقعیت، پاداش‌هایی هستند که خودشان ارزش ذاتی ندارند، اما از طریق ارتباط با سایر اشکال تقویت‌کننده، پاداش‌دهنده می‌شوند.

Reward value is not a simple calculation. Value has various components, both external and internal, that are integrated to form an overall subjective worth. Consider this scenario: You are out fishing along the shoreline and thinking about whether to walk around the lake to an out-of-the-way fishing hole. Do you stay put or pack up your gear? Establishing the value of these options requires considering several factors, all of which contribute to the representation of value:

ارزش پاداش یک محاسبه ساده نیست. ارزش دارای اجزای مختلفی است، هم بیرونی و هم درونی، که یکپارچه شده اند تا یک ارزش ذهنی کلی را تشکیل دهند. این سناریو را در نظر بگیرید: شما در حال ماهیگیری در امتداد خط ساحلی هستید و به این فکر می‌کنید که آیا در اطراف دریاچه قدم بزنید و به یک چاله ماهیگیری خارج از مسیر بروید. آیا سر جای خود می‌مانید یا وسایل خود را جمع می‌کنید؟ تعیین ارزش این گزینه‌ها مستلزم در نظر گرفتن چندین عامل است که همه آنها به نمایش ارزش کمک می‌کنند:

Payoff. What kind of reward do the options offer, and how large is the reward? At the current spot, you might land a small trout or perhaps a bream. At the other spot, you’ve caught a few largemouth bass. Probability. How likely are you to attain the reward? You might remember that the current spot almost always yields a few catches, whereas you’ve most often come back empty-handed from the secret hole.

سود. گزینه‌ها چه نوع پاداشی را ارائه می‌دهند و پاداش چقدر است؟ در نقطه فعلی، ممکن است یک قزل آلای کوچک یا شاید یک ماهی فرود آورید. در نقطه دیگر، شما چند باس بزرگ گرفته اید. احتمال. چقدر احتمال دارد که به پاداش برسید؟ ممکن است به یاد داشته باشید که نقطه فعلی تقریباً همیشه چند شکار می‌دهد، در حالی که اغلب اوقات دست خالی از سوراخ مخفی برمی‌گردید.

Effort or cost. If you stay put, you can start casting right away. Getting to the fishing hole on the other side of the lake will take an hour of scrambling up and down the hillside. One form of cost that has been widely studied is temporal discounting. How long are you willing to wait for a reward? You may not catch large fish at the current spot, but you could feel that satisfying tug 60 minutes sooner if you stayed where you are.

تلاش یا هزینه. اگر سر جای خود بمانید، می‌توانید فوراً انتخاب بازیگران را شروع کنید. رسیدن به چاله ماهیگیری در سمت دیگر دریاچه به یک ساعت بالا و پایین رفتن دامنه تپه نیاز دارد. یکی از انواع هزینه که به طور گسترده مورد مطالعه قرار گرفته است، تنزیل موقت است. چقدر حاضرید برای جایزه صبر کنید؟ ممکن است در نقطه فعلی ماهی‌های بزرگ صید نکنید، اما اگر در جایی که هستید بمانید، ۶۰ دقیقه زودتر می‌توانید آن یدک کش رضایت بخش را احساس کنید.

Context. This factor involves external things, like the time of day, as well as internal things, such as whether you are hungry or tired, or looking forward to an afternoon outing with some friends. Context also includes novelty-you might be the type who values an adventure and the possibility of finding an even better fishing hole on your way to the other side of the lake, or you might be feeling cautious, eager to go with a proven winner. Preference. You may just like one fishing spot better than another for its aesthetics or a fond memory.

زمینه. این عامل شامل موارد بیرونی، مانند زمان روز، و همچنین چیزهای داخلی است، مانند اینکه آیا شما گرسنه هستید یا خسته هستید، یا مشتاقانه منتظر یک گردش بعدازظهر با برخی از دوستان هستید. زمینه همچنین شامل تازگی است – شما ممکن است از افرادی باشید که برای یک ماجراجویی و امکان یافتن یک چاله ماهیگیری حتی بهتر در مسیر خود به سمت دیگر دریاچه ارزش قائل هستید، یا ممکن است محتاط باشید و مشتاق باشید با یک برنده اثبات شده بروید. ترجیح. شما ممکن است یک نقطه ماهیگیری را به دلیل زیبایی یا خاطره ای دوست داشتنی بهتر از دیگری دوست داشته باشید.

As you can see, many factors contribute to subjective value, and they can change immensely from person to person and hour to hour. Given such variation, it is not so surprising that people are highly inconsistent in their decision-making behavior. What seems irrational thinking by another individual might not be, if we could peek into that person’s up-to-date value representation of the current choices.

همانطور که می‌بینید، عوامل زیادی به ارزش ذهنی کمک می‌کنند و می‌توانند از فردی به فرد دیگر و ساعت به ساعت به شدت تغییر کنند. با توجه به چنین تنوعی، تعجب آور نیست که افراد در رفتار تصمیم گیری خود بسیار ناسازگار باشند. اگر می‌توانستیم به ارزش به‌روز آن شخص از انتخاب‌های فعلی نگاهی بیندازیم، آنچه که تفکر غیرمنطقی توسط فرد دیگری به نظر می‌رسد، ممکن است نباشد.

REPRESENTATION OF VALUE

بازنمایی ارزش

How and where is value represented in the brain? Jon Wallis and his colleagues (Kennerley et al., 2009) looked at value representation in the frontal lobes of monkey brains, targeting regions associated with decision making and goal-oriented behavior. While the monkey performed decision- making tasks, the investigators used multiple electrodes to record from cells in three regions: the anterior cingulate cortex (ACC), the lateral prefrontal cortex (LPFC), and the orbitofrontal cortex (OFC). Besides comparing cellular activity in different locations, the experimenters manipulated cost, probability, and payoff. The key question was whether the different areas would show selectivity to particular dimensions. For instance, would OFC be selective to payoff, LPFC to probability, and ACC to cost? Or, would there be an area that coded overall “value” independent of the variable?

چگونه و کجا ارزش در مغز نشان داده می‌شود؟ جان والیس و همکارانش (کنرلی و همکاران، ۲۰۰۹) به بازنمایی ارزش در لوب‌های فرونتال مغز میمون‌ها، هدف قرار دادن مناطق مرتبط با تصمیم گیری و رفتار هدف گرا، نگاه کردند. در حالی که میمون وظایف تصمیم گیری را انجام می‌داد، محققان از الکترودهای متعدد برای ثبت از سلول‌ها در سه ناحیه استفاده کردند: قشر کمربندی قدامی‌(ACC)، قشر جلوی پیشانی جانبی (LPFC) و قشر اوربیتوفرونتال (OFC). علاوه بر مقایسه فعالیت سلولی در مکان‌های مختلف، آزمایش‌کنندگان هزینه، احتمال و بازده را دستکاری کردند. سوال کلیدی این بود که آیا مناطق مختلف نسبت به ابعاد خاص گزینش پذیری نشان می‌دهند یا خیر. به عنوان مثال، آیا OFC برای بازده، LPFC به احتمال و ACC برای هزینه انتخابی است؟ یا، آیا منطقه ای وجود دارد که “مقدار” کلی را مستقل از متغیر رمزگذاری کند؟

As is often observed in neurophysiology studies, the results were quite nuanced. Each of the three regions included cells that responded selectively to a particular dimension, as well as cells that responded to multiple dimensions. Many cells, especially in the ACC, responded to all three dimensions (Figure 12.11). A pattern like this suggests that these cells represent an overall measure of value. In contrast, LPFC cells usually encoded just a single decision variable, with a preference for probability. This pattern might reflect the role of this area in working memory, since probability judgments require integrating the consequences of actions over time. In contrast, OFC neurons had a bias to be tuned to payoff, reflecting the amount of reward associated with each stimulus item.

همانطور که اغلب در مطالعات فیزیولوژی عصبی مشاهده می‌شود، نتایج کاملاً متفاوت بودند. هر یک از این سه منطقه شامل سلول‌هایی بود که به طور انتخابی به یک بعد خاص پاسخ می‌دادند و همچنین سلول‌هایی که به چند بعد پاسخ می‌دادند. بسیاری از سلول‌ها، به ویژه در ACC، به هر سه بعد پاسخ دادند (شکل ۱۲.۱۱). الگویی مانند این نشان می‌دهد که این سلول‌ها معیار کلی ارزش را نشان می‌دهند. در مقابل، سلول‌های LPFC معمولاً فقط یک متغیر تصمیم را با اولویت احتمال کدگذاری می‌کنند. این الگو ممکن است نقش این ناحیه را در حافظه فعال منعکس کند، زیرا قضاوت‌های احتمالی مستلزم ادغام پیامدهای اقدامات در طول زمان است. در مقابل، نورون‌های OFC تمایلی به تنظیم شدن برای بازده داشتند، که نشان دهنده میزان پاداش مرتبط با هر آیتم محرک است.

Similar studies with human participants have been conducted with fMRI. Here the emphasis has been on how different dimensions preferentially activate different neural regions. For example, in one study, OFC activation was closely tied to variation in payoff, whereas activation in the striatum of the basal ganglia was related to effort (Croxson et al., 2009). In another study, LPFC activation was associated with the probability of reward, whereas the delay between the time of the action and the payoff was correlated with activity in the medial PFC and lateral parietal lobe (J. Peters & Buchel, 2009).

مطالعات مشابهی با شرکت کنندگان انسانی با fMRI انجام شده است. در اینجا تأکید بر این بوده است که چگونه ابعاد مختلف ترجیحاً مناطق عصبی مختلف را فعال می‌کنند. به عنوان مثال، در یک مطالعه، فعال سازی OFC ارتباط نزدیکی با تنوع در بازده داشت، در حالی که فعال شدن در جسم مخطط عقده‌های پایه با تلاش مرتبط بود (کراکسون و همکاران، ۲۰۰۹). در مطالعه دیگری، فعال‌سازی LPFC با احتمال پاداش مرتبط بود، در حالی که تأخیر بین زمان عمل و بازده با فعالیت در PFC داخلی و لوب جداری جانبی مرتبط بود (J. Peters & Buchel, 2009).

A classic finding in behavioral economics, temporal discounting, is the observation that the value of a reward is reduced when we have to wait to receive that reward. For example, if given a choice, most people would prefer to immediately receive a $10 reward rather than wait a month for $12 (even though the second option translates into an annual interest rate of 240%). But make people choose between $10 now or $50 in a month, and almost everyone is willing to wait. For a given delay, there is some crossover reward level where the subjective value of an immediate reward is the same as that of a larger amount to be paid off in the future. What would that number be for you?

یک یافته کلاسیک در اقتصاد رفتاری، تنزیل زمانی، مشاهده این است که ارزش یک پاداش زمانی کاهش می‌یابد که باید منتظر دریافت آن پاداش باشیم. به عنوان مثال، اگر به آنها حق انتخاب داده شود، اکثر مردم ترجیح می‌دهند بلافاصله یک پاداش ۱۰ دلاری دریافت کنند تا اینکه ماهانه برای ۱۲ دلار منتظر بمانند (حتی اگر گزینه دوم به نرخ بهره سالانه ۲۴۰٪ ترجمه شود). اما مردم را مجبور کنید بین ۱۰ دلار در حال حاضر یا ۵۰ دلار در یک ماه انتخاب کنند و تقریباً همه مایل به صبر هستند. برای یک تاخیر معین، یک سطح پاداش متقاطع وجود دارد که در آن ارزش ذهنی یک پاداش فوری با مقدار بیشتری که در آینده باید پرداخت شود، یکسان است. آن عدد برای شما چه خواهد بود؟

Given the association of the OFC with value representation, researchers at the University of Bologna tested people with lesions encompassing this region on temporal discounting tasks (Sellitto et al., 2010). For both food and monetary rewards, the OFC patients showed abnormal temporal discounting in comparison to patients with lesions outside the OFC or healthy control participants (Figure 12.12). Extrapolating from the graph in Figure 12.12c, we can see that a control participant is willing to wait 4 to 6 months if the monetary reward will double (say, $100 rather than $50), while the average OFC patient isn’t willing to wait even 3 weeks. An increase in seemingly impulsive behavior might be a consequence of poor temporal discounting; immediate outcomes are preferred, even though a much more rational choice would be to wait for the bigger payoff.

با توجه به ارتباط OFC با بازنمایی ارزش، محققان دانشگاه بولونیا افراد مبتلا به ضایعات که این منطقه را در بر می‌گرفت را در وظایف تنزیل زمانی آزمایش کردند (Sellitto et al., 2010). هم برای غذا و هم برای پاداش‌های پولی، بیماران OFC در مقایسه با بیماران دارای ضایعات خارج از OFC یا شرکت‌کنندگان کنترل سالم، تخفیف زمانی غیرطبیعی نشان دادند (شکل ۱۲.۱۲). با برون یابی نمودار در شکل ۱۲.12c، می‌توانیم ببینیم که اگر پاداش پولی دو برابر شود (مثلاً ۱۰۰ دلار به جای ۵۰ دلار)، یک شرکت کننده کنترل حاضر است ۴ تا ۶ ماه صبر کند، در حالی که متوسط بیمار OFC مایل به صبر نیست. حتی ۳ هفته افزایش در رفتار به ظاهر تکانشی ممکن است نتیجه تخفیف زمانی ضعیف باشد. نتایج فوری ترجیح داده می‌شود، حتی اگر انتخاب بسیار منطقی تر این باشد که منتظر نتیجه بزرگتر باشیم.

شکل 12.11 نمایش سلولی ارزش در قشر پره‌فرونتال قسمت اول

FIGURE 12.11 Cellular representation of value in the prefrontal cortex.
(a) Simultaneous recordings were made from multiple electrodes that were positioned in lateral prefrontal cortex (red), orbitofrontal cortex (blue), or anterior cingulate cortex (green). (b) Cellular correlates were found for all three dimensions in each of the three regions, although the number of task-relevant neurons is different between regions. The dimensional preference varies between regions, with the LPFC preferring probability over payoff, and the OFC preferring payoff over probability.

شکل ۱۲.۱۱ نمایش سلولی ارزش در قشر پره‌فرونتال.
(الف) ثبت همزمان از الکترودهای متعددی که در قشر جلوی پیشانی جانبی (قرمز)، قشر اربیتو فرونتال (آبی)، یا قشر کمربندی قدامی‌(سبز) قرار داشتند، انجام شد. (ب) همبستگی‌های سلولی برای هر سه بعد در هر یک از سه منطقه یافت شد، اگرچه تعداد نورون‌های مرتبط با کار بین مناطق متفاوت است. اولویت ابعادی بین مناطق متفاوت است، به طوری که LPFC احتمال را بر بازده ترجیح می‌دهد و OFC سود را بر احتمال ترجیح می‌دهد.

شکل 12.12 بیماران مبتلا به ضایعات OFC به شدت پاداش فوری را به پاداش‌های تاخیری ترجیح می‌دهند قسمت اول

FIGURE 12.12 Patients with OFC lesions strongly prefer Immediate rewards over delayed rewards.
(a) The participant must choose either to receive an immediate reward of modest value or to wait for a specified delay period in order to receive a larger reward. (b) The locations of orbitofrontal lesions in seven individuals are all projected here onto each of seven different horizontal slices. The color bar indicates the number of lesions affecting each brain region. The white horizontal lines on the sagittal view (bottom right) indicate the level of the horizontal slices, where 23 is the most dorsal. (c) Temporal discounting function. The curve indicates how much a delayed reward is discounted, relative to an immediate reward. The dashed line indicates when the delayed option is discounted by 50%. For example, the healthy controls (green) and patients with lesions outside the frontal cortex (non-FC, blue) are willing to wait 4 to 6 months to receive $100 rather than receiving an immediate payoff of $50. The patients with OFC lesions (red) are willing to wait only about 2 weeks for the larger payoff. Similar behavior is observed if the reward is food or vouchers to attend museums.

شکل ۱۲.۱۲ بیماران مبتلا به ضایعات OFC به شدت پاداش فوری را به پاداش‌های تاخیری ترجیح می‌دهند.
(الف) شرکت‌کننده باید انتخاب کند که یک پاداش فوری با ارزش متوسط دریافت کند یا برای دریافت یک پاداش بزرگتر منتظر یک دوره تاخیر مشخص باشد. (ب) محل ضایعات اوربیتوفرونتال در هفت نفر، همه در اینجا بر روی هر یک از هفت برش افقی مختلف پیش بینی شده است. نوار رنگ نشان دهنده تعداد ضایعاتی است که بر هر ناحیه مغز تأثیر می‌گذارد. خطوط افقی سفید در نمای ساژیتال (پایین سمت راست) سطح برش‌های افقی را نشان می‌دهد، جایی که ۲۳ بیشترین پشتی است. (ج) تابع تنزیل موقت. منحنی نشان می‌دهد که پاداش تاخیری نسبت به پاداش فوری چقدر تخفیف دارد. خط چین نشان می‌دهد که چه زمانی گزینه تاخیر ۵۰ درصد تخفیف داده می‌شود. به عنوان مثال، افراد سالم (سبز) و بیماران با ضایعات خارج از قشر پیشانی (غیر FC، آبی) مایلند ۴ تا ۶ ماه صبر کنند تا ۱۰۰ دلار دریافت کنند به جای دریافت بازپرداخت فوری ۵۰ دلاری. بیماران مبتلا به ضایعات OFC (قرمز) مایلند فقط حدود ۲ هفته برای نتیجه بزرگتر صبر کنند. اگر پاداش غذا یا کوپن برای حضور در موزه‌ها باشد، رفتار مشابهی مشاهده می‌شود.

The importance of time in decision making is also seen in the all-too-common situation in which an action might have positive immediate benefits but long-term negative consequences. For example, what happens when dieters are given their choice of a tasty but unhealthy treat (like the jelly doughnut) and a healthy but perhaps less tasty one (plain yogurt)? Interestingly, activity in the OFC was correlated with taste preference, regardless of whether the item was healthy (Figure 12.13).

اهمیت زمان در تصمیم گیری نیز در موقعیت بسیار رایجی که در آن یک اقدام ممکن است فواید مثبت و پیامدهای منفی طولانی مدت داشته باشد، دیده می‌شود. به عنوان مثال، چه اتفاقی می‌افتد وقتی به رژیم‌های غذایی یک خوراکی خوشمزه اما ناسالم (مانند ژله دونات) و یک غذای سالم اما شاید کمتر خوش طعم (ماست ساده) داده شود؟ به طور جالب توجهی، فعالیت در OFC با ترجیح طعم مرتبط بود، صرف نظر از سالم بودن آن (شکل ۱۲.۱۳).

In contrast, the LPFC area was associated with the degree of control (Hare et al., 2009): Activity here was greater on trials in which a preferred but unhealthy item was refused, as compared to trials in which that item was selected. Moreover, this difference was much greater in people who were judged to be better at exhibiting self- control. It may be that the OFC originally evolved to forecast the short-term value of stimuli. Over evolutionary time, structures such as the LPFC began to modulate more primitive, or primary, value signals, providing humans with the ability to incorporate long-term considerations into value representations. These findings also suggest that a fundamental difference between successful and failed self-control might be the extent to which the LPFC can modulate the value signal encoded in the OFC.

در مقابل، ناحیه LPFC با درجه کنترل مرتبط بود (Hare et al., 2009): فعالیت در اینجا در کارآزمایی‌هایی که در آن یک مورد ترجیحی اما ناسالم رد شده بود، بیشتر بود، در مقایسه با کارآزمایی‌هایی که در آن آیتم انتخاب شده بود. علاوه بر این، این تفاوت در افرادی که ارزیابی می‌شدند در نشان دادن خودکنترلی بهتر بودند، بسیار بیشتر بود. ممکن است OFC در ابتدا برای پیش بینی ارزش کوتاه مدت محرک‌ها تکامل یافته باشد. در طول زمان تکامل، ساختارهایی مانند LPFC شروع به تعدیل سیگنال‌های ارزشی اولیه یا اولیه‌تر کردند و به انسان‌ها توانایی ادغام ملاحظات بلندمدت را در بازنمایی‌های ارزشی دادند. این یافته‌ها همچنین نشان می‌دهد که یک تفاوت اساسی بین خودکنترلی موفق و ناموفق ممکن است میزانی باشد که LPFC می‌تواند سیگنال مقدار کدگذاری شده در OFC را تعدیل کند.

Overall, the neurophysiological and neuroimaging studies indicate that the OFC plays a key role in the representation of value. More lateral regions of the PFC are important for some form of modulatory control of these representations or the actions associated with them. We have seen one difference between the neurophysiological and neuroimaging results: The former emphasize a distributed picture of value representation, and the latter emphasize specialization within components of a decision-making network. The discrepancy, though, is probably due to the differential sensitivity of the two methods. The fine-grained spatial resolution of neuro- physiology enables us to ask whether individual cells are sensitive to particular dimensions. In contrast, fMRI studies generally provide relative answers, asking whether an area is more responsive to variation in one dimension compared to another dimension.

به طور کلی، مطالعات نوروفیزیولوژیکی و تصویربرداری عصبی نشان می‌دهد که OFC نقش کلیدی در نمایش ارزش دارد. مناطق جانبی بیشتری از PFC برای نوعی از کنترل تعدیلی این نمایش‌ها یا اقدامات مرتبط با آنها مهم هستند. ما یک تفاوت بین نتایج نوروفیزیولوژیکی و تصویربرداری عصبی دیده‌ایم: اولی بر تصویر توزیع شده از بازنمایی ارزش تأکید می‌کند، و دومی‌بر تخصص در اجزای یک شبکه تصمیم‌گیری تأکید می‌کند. اگرچه این اختلاف احتمالاً به دلیل حساسیت متفاوت دو روش است. تفکیک فضایی ریز فیزیولوژی عصبی ما را قادر می‌سازد تا بپرسیم آیا سلول‌های فردی به ابعاد خاصی حساس هستند یا خیر. در مقابل، مطالعات fMRI به طور کلی پاسخ‌های نسبی را ارائه می‌دهند و می‌پرسند که آیا یک ناحیه نسبت به بعد دیگر به تغییرات در یک بعد پاسخ‌گوتر است یا خیر.

شکل 12.13 تفکیک OFC و LPFC در طول یک تکلیف انتخاب غذا

FIGURE 12.13 Dissociation of OFC and LPFC during a food selection task.

شکل ۱۲.۱۳ تفکیک OFC و LPFC در طول یک تکلیف انتخاب غذا.

(a) OFC regions that showed a positive relationship between the BOLD response and food preference. This signal provides a representation of value. (b) LPFC region in which the BOLD response was related to self-control. The signal was stronger on trials in which participants exhibited self-control (did not choose a highly rated but nutritionally poor food) compared to trials in which they failed to exhibit self-control. The difference was especially pronounced in participants who, according to survey data, were rated as having good self-control. (c) Activity in the OFC increased with preference. (d) The left LPFC showed greater activity in successful self-control trials in the self-control group (left) than in the no-self-control group (right). Both groups showed greater activity in the LPFC for successful versus failed self-control (SC) tasks.

(الف) مناطق OFC که رابطه مثبتی بین پاسخ BOLD و ترجیح غذا نشان دادند. این سیگنال نمایشی از مقدار را ارائه می‌دهد. (ب) ناحیه LPFC که در آن پاسخ BOLD مربوط به خودکنترلی بود. این سیگنال در کارآزمایی‌هایی که شرکت‌کنندگان در آن‌ها خودکنترلی نشان دادند (یک غذای با رتبه بالا اما از نظر تغذیه ضعیف را انتخاب نکردند) در مقایسه با کارآزمایی‌هایی که در آن‌ها نتوانستند خودکنترلی نشان دهند، قوی‌تر بود. این تفاوت به‌ویژه در شرکت‌کنندگانی که بر اساس داده‌های نظرسنجی، دارای خودکنترلی خوب رتبه‌بندی شدند، مشهود بود. (ج) فعالیت در OFC با اولویت افزایش یافت. (د) LPFC چپ در کارآزمایی‌های موفق خودکنترلی در گروه خودکنترلی (چپ) نسبت به گروه بدون خودکنترلی (راست) فعالیت بیشتری نشان داد. هر دو گروه فعالیت بیشتری را در LPFC برای وظایف خودکنترلی موفق در مقابل شکست خورده نشان دادند.

More Than One Type of Decision System?

بیش از یک نوع سیستم تصمیم گیری؟

The laboratory is an artificial environment. Many of the experimental paradigms used to study decision making involve conditions in which the participant has ready access to the different choices and at least some information about the potential rewards and costs. Thus, participants are able to calculate and compare values. In the natural environment, especially the one our ancestors roamed about in, this situation is the exception, not the norm. More frequently, we must choose between an option with a known value and one or more options of unknown value (Rushworth et al., 2012).

آزمایشگاه یک محیط مصنوعی است. بسیاری از پارادایم‌های تجربی مورد استفاده برای مطالعه تصمیم گیری شامل شرایطی است که در آن شرکت کننده به انتخاب‌های مختلف و حداقل برخی اطلاعات در مورد پاداش‌ها و هزینه‌های بالقوه دسترسی آماده دارد. بنابراین، شرکت کنندگان قادر به محاسبه و مقایسه مقادیر هستند. در محیط طبیعی، به ویژه محیطی که اجداد ما در آن پرسه می‌زدند، این وضعیت استثنا است، نه یک هنجار. اغلب، ما باید بین یک گزینه با مقدار معلوم و یک یا چند گزینه با ارزش ناشناخته یکی را انتخاب کنیم (راشورث و همکاران، ۲۰۱۲).

The classic example is foraging: Animals have to make decisions about where to seek food and water, precious commodities that tend to occur in restricted locations and for only a short time. Foraging brings up questions that require decisions—such as, Do I keep eating/hunting/fishing here or move on to (what may or may not be) greener pastures, birdier bushes, or fishier water holes? In other words, do I continue to exploit the resources at hand or set out to explore in hopes of finding a richer niche? To make the decision, the animal must calculate the value of the current option, the richness of the overall environment, and the costs of exploration.

مثال کلاسیک جستجوی علوفه است: حیوانات باید تصمیم بگیرند که کجا به دنبال غذا و آب بگردند، کالاهای گرانبهایی که معمولاً در مکان‌های محدود و فقط برای مدت کوتاهی یافت می‌شوند. جستجوی علوفه سوالاتی را مطرح می‌کند که نیاز به تصمیم‌گیری دارند، مانند، آیا به خوردن/شکار/ماهیگیری در اینجا ادامه می‌دهم یا به مراتع سبزتر، بوته‌های پرنده‌تر، یا چاله‌های آب ماهی‌گیرتر (که ممکن است باشد یا نباشد) می‌روم؟ به عبارت دیگر، آیا به بهره برداری از منابع در دست ادامه می‌دهم یا به امید یافتن جایگاه غنی تری به اکتشاف می‌پردازم؟ برای تصمیم گیری، حیوان باید ارزش گزینه فعلی، غنای کلی محیط و هزینه‌های اکتشاف را محاسبه کند.

Worms, bees, wasps, spiders, fish, birds, seals, monkeys, and human subsistence foragers all obey a basic principle in their foraging behavior, referred to by economists as the “marginal value theorem” (Charnov, 1974). The animal exploits a foraging patch until its intake rate falls below the average intake rate for the overall environment. At that point, the animal becomes exploratory. Because this behavior is so consistent across so many species, scientists have hypothesized that this tendency may be deeply encoded in our genes. Indeed, biologists have identified a specific set of genes that influence how worms decide when it is time to start looking for “greener lawns” (Bendesky et al., 2011).

کرم‌ها، زنبورها، زنبورها، عنکبوت‌ها، ماهی‌ها، پرندگان، فوک‌ها، میمون‌ها و جستجوگران امرار معاش انسان، همگی از یک اصل اساسی در رفتار جستجوی خود پیروی می‌کنند که توسط اقتصاددانان به عنوان «قضیه ارزش حاشیه‌ای» نامیده می‌شود (چارنوف، ۱۹۷۴). این حیوان تا زمانی که میزان مصرف آن به کمتر از میانگین میزان مصرف برای کل محیط کاهش یابد، از یک پچ علوفه استفاده می‌کند. در آن مرحله، حیوان اکتشافی می‌شود. از آنجایی که این رفتار در بسیاری از گونه‌ها بسیار سازگار است، دانشمندان فرض کرده اند که این تمایل ممکن است عمیقاً در ژن‌های ما رمزگذاری شده باشد. در واقع، زیست‌شناسان مجموعه خاصی از ژن‌ها را شناسایی کرده‌اند که بر نحوه تصمیم‌گیری کرم‌ها در زمان شروع جستجوی «چمن‌های سبزتر» تأثیر می‌گذارند (Bendesky et al., 2011).

Benjamin Hayden (2011) and his colleagues investigated the neuronal mechanisms that might be involved in foraging-like decisions. They hypothesized that such decisions require a decision variable, a representation that specifies the current value of leaving a patch, even if the alternative is relatively unknown. When this variable reaches a threshold, a signal is generated indicating that it is time to look for greener pastures. A number of factors influence how soon this threshold is reached: the current expected payoff, the expected benefits and costs for traveling to a new patch, and the uncertainty of obtaining reward at the next location. In our fishing example, for instance, if it takes 2 hours instead of 1 to go around the lake to a better fishing spot, you are less likely to move.

بنجامین هیدن (۲۰۱۱) و همکارانش مکانیسم‌های عصبی را که ممکن است در تصمیم گیری‌های جستجوگر نقش داشته باشند، بررسی کردند. آنها فرض کردند که چنین تصمیماتی به یک متغیر تصمیم نیاز دارد، نمایشی که مقدار فعلی ترک یک پچ را مشخص می‌کند، حتی اگر جایگزین نسبتا ناشناخته باشد. هنگامی‌که این متغیر به یک آستانه می‌رسد، سیگنالی تولید می‌شود که نشان می‌دهد زمان جستجوی مراتع سبزتر فرا رسیده است. تعدادی از عوامل بر سرعت رسیدن به این آستانه تأثیر می‌گذارند: بازده مورد انتظار فعلی، مزایا و هزینه‌های مورد انتظار برای سفر به یک پچ جدید، و عدم اطمینان در دریافت پاداش در مکان بعدی. به عنوان مثال، در مثال ماهیگیری ما، اگر به جای ۱، ۲ ساعت طول بکشد تا دریاچه را به یک نقطه ماهیگیری بهتر بپیمایید، احتمال حرکت شما کمتر است.

Hayden recorded from cells in the ACC (part of the medial PFC) of monkeys, choosing this region because it has been linked to the monitoring of actions and their out- comes (which we discuss later in the chapter). The animals were presented with a virtual foraging task in which they chose one of two targets. One stimulus (S1) was followed by a reward after a short delay, but the amount decreased with each successive trial (equivalent to remaining in a patch and reducing the food supply by eating it). The other stimulus (S2) allowed the animals to change the outcome contingencies. They received no reward on that trial, but after a variable period of time (the cost of exploration), the choices were presented again and the payoff for S1 was reset to its original value (a greener patch).

هیدن از سلول‌های ACC (بخشی از PFC داخلی) میمون‌ها، این ناحیه را انتخاب کرد زیرا با نظارت بر اعمال و نتایج آنها مرتبط است (که در ادامه این فصل به آن خواهیم پرداخت). به حیوانات یک کار جستجوی مجازی ارائه شد که در آن آنها یکی از دو هدف را انتخاب کردند. یک محرک (S1) پس از تأخیر کوتاهی با پاداش همراه شد، اما با هر آزمایش متوالی مقدار آن کاهش می‌یابد (معادل باقی ماندن در یک پچ و کاهش عرضه غذا با خوردن آن). محرک دیگر (S2) به حیوانات این امکان را می‌دهد که پیامدهای احتمالی را تغییر دهند. آنها هیچ پاداشی در آن آزمایش دریافت نکردند، اما پس از یک دوره زمانی متغیر (هزینه اکتشاف)، انتخاب‌ها دوباره ارائه شد و بازده S1 به مقدار اولیه آن (یک پچ سبزتر) بازنشانی شد.

Consistent with the marginal value theorem, the animals became less likely to choose $1 as the waiting time increased or the amount of reward decreased. What’s more, cellular activity in the ACC was highly predictive of the amount of time the animal would continue to “for- age” by choosing S1. Most interesting, the cells showed the property of a threshold: When the firing rate was greater than 20 spikes per second, the animal left the patch (Figure 12.14).

مطابق با قضیه ارزش حاشیه ای، حیوانات با افزایش زمان انتظار یا کاهش مقدار پاداش، احتمال کمتری داشتند که ۱ دلار را انتخاب کنند. علاوه بر این، فعالیت سلولی در ACC بسیار پیش بینی کننده مدت زمانی بود که حیوان با انتخاب S1 به “جلوگیری” ادامه می‌دهد. جالب‌تر از همه، سلول‌ها خاصیت آستانه را نشان دادند: وقتی سرعت شلیک بیشتر از ۲۰ میخ در ثانیه بود، حیوان پچ را ترک کرد (شکل ۱۲.۱۴).

The hypothesis that the ACC plays a critical role in foraging-like decisions is further supported by fMRI studies with humans (Kolling et al., 2012). When the person is making choices about where to sample in a virtual world, the BOLD response in ACC correlates positively with the search value (explore) and negatively with the encounter value (exploit), regardless of which choice participants made. In this condition, ventromedial regions of the PFC did not signal overall value. If, however, experimenters modified the task so that the participants were engaged in a comparison decision, activation in the OFC reflected the chosen option value. Taken together, these studies suggest that ACC signals exert a type of control by promoting a particular behavior: exploring the environment for better alternatives compared to the current course of action (Rushworth et al., 2012).

این فرضیه که ACC نقش مهمی‌در تصمیم گیری‌های جستجوگر ایفا می‌کند بیشتر توسط مطالعات fMRI با انسان‌ها پشتیبانی می‌شود (Kolling و همکاران، ۲۰۱۲). هنگامی‌که فرد در حال انتخاب مکان نمونه گیری در دنیای مجازی است، پاسخ BOLD در ACC به طور مثبت با مقدار جستجو (کاوش) و با مقدار مواجهه (اکسپلویت) ارتباط منفی دارد، صرف نظر از اینکه شرکت کنندگان چه انتخابی انجام داده اند. در این شرایط، نواحی شکمی‌PFC ارزش کلی را نشان نمی‌دهند. با این حال، اگر آزمایش‌کنندگان کار را طوری تغییر دهند که شرکت‌کنندگان درگیر تصمیم مقایسه باشند، فعال‌سازی در OFC مقدار گزینه انتخابی را منعکس می‌کند. روی هم رفته، این مطالعات نشان می‌دهد که سیگنال‌های ACC نوعی کنترل را با ترویج یک رفتار خاص اعمال می‌کنند: کاوش در محیط برای جایگزین‌های بهتر در مقایسه با روند فعلی (راشورث و همکاران، ۲۰۱۲).

شکل 12.14 فعالیت عصبی در ACC با تصمیمات میمون‌ها برای تغییر به یک «پچ» جدید در یک کار جستجوی متوالی مرتبط است

FIGURE 12.14 Neuronal activity in ACC is correlated with decisions by monkeys to change to a new “patch” in a sequential foraging task.
Data were sorted according to the amount of time the animal stayed in one patch (from shortest to longest: black, red, blue, purple). For each duration, the animal switched to a new patch when the firing rate of the ACC neurons was double the normal level of activity.

شکل ۱۲.۱۴ فعالیت عصبی در ACC با تصمیمات میمون‌ها برای تغییر به یک «پچ» جدید در یک کار جستجوی متوالی مرتبط است.
داده‌ها بر اساس مدت زمانی که حیوان در یک تکه ماند (از کوتاه ترین به طولانی ترین: سیاه، قرمز، آبی، بنفش) مرتب شدند. برای هر مدت، زمانی که سرعت شلیک نورون‌های ACC دو برابر سطح طبیعی فعالیت بود، حیوان به یک پچ جدید تغییر مکان داد.

It is obvious that people prefer to have a choice between two high-valued rewards rather than two low- valued rewards. For example, you would probably rather choose between two great job offers than two bad job offers. But what if the choice is between a high-valued reward and a low-valued reward? Logically, it would seem that this option should be less preferable to the choice between two high-valued rewards-you are giving up an option to consider a desired outcome for an undesired outcome. However, “preference” is based on multiple factors. Although we like to face “win-win” options, having to choose between them creates anxiety. This anxiety is not present when the choice is simply a good option over a bad option.

بدیهی است که مردم ترجیح می‌دهند بین دو پاداش با ارزش بالا به جای دو پاداش کم ارزش، یکی را انتخاب کنند. به عنوان مثال، احتمالاً ترجیح می‌دهید بین دو پیشنهاد شغلی عالی به جای دو پیشنهاد شغلی بد، یکی را انتخاب کنید. اما اگر انتخاب بین یک پاداش با ارزش بالا و یک پاداش کم ارزش باشد چه؟ به طور منطقی، به نظر می‌رسد که این گزینه نسبت به انتخاب بین دو پاداش با ارزش کمتر ترجیح داده می‌شود – شما از گزینه ای صرف نظر می‌کنید تا یک نتیجه مطلوب را برای یک نتیجه نامطلوب در نظر بگیرید. با این حال، “ترجیح” بر اساس عوامل متعددی است. اگرچه ما دوست داریم با گزینه‌های “برد-برد” روبرو شویم، انتخاب بین آنها باعث ایجاد اضطراب می‌شود. وقتی انتخاب صرفاً یک گزینه خوب نسبت به یک گزینه بد باشد، این اضطراب وجود ندارد.

Harvard University researchers explored the brain systems involved with the simultaneous, though paradoxical, experiences of feeling both good and anxious.

محققان دانشگاه‌هاروارد سیستم‌های مغزی درگیر با تجربیات همزمان، هر چند متناقض، از احساس خوب و اضطراب را بررسی کردند.

While undergoing fMRI, participants chose between two low-valued products, a low- and a high-valued product, or two high-valued products in pairs of products that they had an actual chance of winning (in a lottery, conducted at the end of the experiment). The participants were also asked to rate how positive and anxious they felt about each choice (Shenhav & Buckner, 2014; Figure 12.15a).

در حین انجام fMRI، شرکت کنندگان بین دو محصول کم ارزش، یک محصول با ارزش پایین و یک محصول با ارزش بالا، یا دو محصول با ارزش بالا در جفت محصولاتی که شانس واقعی برنده شدن داشتند (در قرعه کشی که در پایان انجام شد) را انتخاب کردند. از آزمایش). همچنین از شرکت کنندگان خواسته شد تا میزان احساس مثبت و اضطراب خود را در مورد هر انتخاب ارزیابی کنند (شنهاو و باکنر، ۲۰۱۴؛ شکل ۱۲.15a).

As predicted, win-win choices (high value/high value) generated the most positive feelings but were also associated with the highest levels of anxiety (Figure 12.15b). You have great choices, but you are conflicted: Which is better? Conversely, the low/low choice ranked low on both scales. With bad options to choose from, you don’t really care which one you select: No conflict, no anxiety. Low/high choices led to high levels of positive feelings and low anxiety. You feel good and there is no conflict: The choice is a slam dunk.

همانطور که پیش‌بینی شد، انتخاب‌های برد-برد (ارزش بالا/ارزش بالا) بیشترین احساسات مثبت را ایجاد کردند، اما با بالاترین سطوح اضطراب نیز همراه بودند (شکل ۱۲.15b). شما انتخاب‌های عالی دارید، اما در تعارض هستید: کدام بهتر است؟ برعکس، انتخاب کم/کم در هر دو مقیاس رتبه پایینی داشت. با گزینه‌های بدی که می‌توانید انتخاب کنید، واقعاً برایتان مهم نیست که کدام یک را انتخاب کنید: بدون درگیری، بدون اضطراب. انتخاب‌های کم/بالا منجر به سطوح بالای احساسات مثبت و اضطراب کم شد. احساس خوبی دارید و هیچ درگیری وجود ندارد: انتخاب یک اسلم دانک است.

The fMRI data revealed two dissociable neural circuits that correlated with these different variables influencing decision making (Figure 12.15c). The OFC tracked how positive the participant felt about the choices, consistent with the hypothesis that this region is important in the representation of the anticipated payoff or reward. In contrast, the BOLD response in the ACC value tracked anxiety, with the highest level of activity on the difficult win-win (high/high) choices. Similar to what we saw in the discussion of foraging, ACC activation is predictive when there is conflict between one option or another-a hypothesis we will return to later in this chapter.

داده‌های fMRI دو مدار عصبی قابل تفکیک را نشان داد که با این متغیرهای مختلف تأثیرگذار بر تصمیم‌گیری همبستگی داشتند (شکل ۱۲.15c). OFC با این فرضیه که این منطقه در نمایش بازده یا پاداش پیش بینی شده اهمیت دارد، میزان احساس مثبت شرکت کننده در مورد انتخاب‌ها را پیگیری کرد. در مقابل، پاسخ BOLD در مقدار ACC، اضطراب را با بالاترین سطح فعالیت در انتخاب‌های دشوار برد-برد (بالا/بالا) ردیابی کرد. مشابه آنچه در بحث جست‌وجوی غذا دیدیم، فعال‌سازی ACC زمانی پیش‌بینی‌کننده است که بین یک یا گزینه دیگر تضاد وجود داشته باشد – فرضیه‌ای که بعداً در این فصل به آن باز خواهیم گشت.

Dopamine Activity and Reward Processing

فعالیت دوپامین و پردازش پاداش

We have seen that rewards, especially those associated with primary reinforcers like food and sex, are fundamental to the behavior of all animals. It follows that the processing of such signals might involve phylogenetically older neural structures. Indeed, converging lines of evidence indicate that many subcortical areas represent reward information, including the basal ganglia, hypothalamus, amygdala, and lateral habenula (for a review, see O. Hikosaka et al., 2008). Much of the work on reward has focused on the neurotransmitter dopamine (DA). We should keep in mind, however, that reinforcement likely involves the interplay of many transmitters. For instance, evidence suggests that serotonin is important for the temporal discounting of reward value (S. C. Tanaka et al., 2007).

ما دیده‌ایم که پاداش‌ها، به‌ویژه آنهایی که با تقویت‌کننده‌های اولیه مانند غذا و رابطه جنسی مرتبط هستند، برای رفتار همه حیوانات ضروری است. نتیجه این است که پردازش چنین سیگنال‌هایی ممکن است شامل ساختارهای عصبی فیلوژنتیکی قدیمی‌تر باشد. در واقع، خطوط همگرای شواهد نشان می‌دهد که بسیاری از نواحی زیر قشری نشان دهنده اطلاعات پاداش هستند، از جمله عقده‌های پایه، هیپوتالاموس، آمیگدال، و‌هابنولاهای جانبی (برای بررسی، O. Hikosaka و همکاران، ۲۰۰۸ را ببینید). بیشتر کار روی پاداش بر انتقال دهنده عصبی دوپامین (DA) متمرکز شده است. با این حال، باید در نظر داشته باشیم که تقویت احتمالاً شامل تعامل بسیاری از فرستنده‌ها می‌شود. به عنوان مثال، شواهد نشان می‌دهد که سروتونین برای کاهش زمانی ارزش پاداش مهم است (S. C. Tanaka et al., 2007).

Dopaminergic (dopamine-activated) cells are scattered throughout the midbrain, sending axonal projections to many cortical and subcortical areas. Two of the primary loci of dopaminergic neurons are two brainstem nuclei, the substantia nigra pars compacta (SN) and the ventral tegmental area (VTA). As discussed in Chapter 8, the dopaminergic neurons from the substantia nigra project to the dorsal striatum, the major input nucleus of the basal ganglia. Loss of these neurons is related to the movement initiation problems observed in patients with Parkinson’s disease. Dopaminergic neurons that originate in the VTA project through two pathways: The mesolimbic pathway travels to structures important to emotional processing, and the mesocortical pathway travels to the neocortex, particularly to the medial portions of the frontal lobe.

سلول‌های دوپامینرژیک (فعال شده با دوپامین) در سرتاسر مغز میانی پراکنده می‌شوند و برجستگی‌های آکسونی را به بسیاری از نواحی قشر و زیر قشری می‌فرستند. دو تا از جایگاه‌های اولیه نورون‌های دوپامینرژیک، دو هسته ساقه مغز، ماده سیاه پارس فشرده (SN) و ناحیه تگمنتال شکمی‌(VTA) هستند. همانطور که در فصل ۸ مورد بحث قرار گرفت، نورون‌های دوپامینرژیک از جسم سیاه تا جسم مخطط پشتی، هسته اصلی ورودی عقده‌های پایه، پروژه می‌شوند. از دست دادن این نورون‌ها مربوط به مشکلات شروع حرکت است که در بیماران مبتلا به پارکینسون مشاهده می‌شود. نورون‌های دوپامینرژیک که در پروژه VTA از دو مسیر منشا می‌گیرند: مسیر مزولیمبیک به ساختارهای مهم برای پردازش هیجانی و مسیر مزوکورتیکال به نئوکورتکس، به‌ویژه به بخش‌های داخلی لوب فرونتال می‌رود.

شکل 12.15 همبستگی عصبی تصمیم گیری تحت استرس قسمت دوم

FIGURE 12.15 Neural correlates of decision making under stress.
(a) On each trial, participants indicated their preferred choice between a pair of items, each of which could be of either low or high value. The selected items were entered in a lottery, and one was chosen at random as the reward for participating in the experiment. (b) Ratings of desirability (positive affect, top) and anxiety (bottom) for the three conditions. (c) Brain regions in which the BOLD response was correlated with higher levels of positive affect (green) or higher levels of anxiety (red). The former was most evident in OFC; the latter was most evident in ACC, part of the medial frontal cortex.

شکل ۱۲.۱۵ همبستگی عصبی تصمیم گیری تحت استرس.
(الف) در هر کارآزمایی، شرکت‌کنندگان انتخاب ترجیحی خود را بین یک جفت آیتم، که هر یک می‌توانست دارای ارزش کم یا زیاد باشد، نشان دادند. آیتم‌های انتخاب شده در قرعه کشی وارد شدند و یکی از آنها به صورت تصادفی به عنوان پاداش شرکت در آزمایش انتخاب شد. (ب) درجه بندی مطلوبیت (عاطفه مثبت، بالا) و اضطراب (پایین) برای سه حالت. (ج) نواحی مغز که در آنها پاسخ BOLD با سطوح بالاتر عاطفه مثبت (سبز) یا سطوح بالاتر اضطراب (قرمز) مرتبط بود. اولی بیشتر در OFC مشهود بود. مورد دوم در ACC، بخشی از قشر فرونتال داخلی بیشتر مشهود بود.

The link between dopamine and reward began with the classic work of James Olds and Peter Milner in the early 1950s (Olds, 1958; Olds & Milner, 1954). They implanted electrodes into the brains of rats and then gave the rats the opportunity to control the electrodes. When the rat pushed a lever, the electrode became activated. Some of the rats rarely pressed the lever. Others pressed the lever like crazy. The difference turned out to be the location of the electrodes. The rats that couldn’t stop self- stimulating were the ones whose electrodes were activating dopaminergic pathways.

پیوند بین دوپامین و پاداش با آثار کلاسیک جیمز اولدز و پیتر میلنر در اوایل دهه ۱۹۵۰ آغاز شد (اولدز، ۱۹۵۸؛ اولدز و میلنر، ۱۹۵۴). آنها الکترودهایی را در مغز موش‌ها کاشتند و سپس به موش‌ها این فرصت را دادند که الکترودها را کنترل کنند. هنگامی‌که موش یک اهرم را فشار داد، الکترود فعال شد. برخی از موش‌ها به ندرت اهرم را فشار می‌دادند. دیگران دیوانه وار اهرم را فشار دادند. تفاوت در محل الکترودها بود. موش‌هایی که نمی‌توانستند خود تحریکی را متوقف کنند، موش‌هایی بودند که الکترودهایشان مسیرهای دوپامینرژیک را فعال می‌کرد.

Originally, neuroscientists thought of dopamine as the neural correlate of reward, but this hypothesis turned out to be too simplistic. A key challenge to the reward hypothesis came about when investigators recognized that the activation of dopaminergic neurons was not tied to the size of the reward per se, but was more closely related to the expectancy of reward (Schultz, 1998). Specifically, for a given amount of reward, the activity of the dopaminergic neurons is much higher when that reward is unexpected compared to when it is expected. This observation led to a new view of the role of dopamine in reinforcement and decision making.

در ابتدا، دانشمندان علوم اعصاب دوپامین را به عنوان همبستگی عصبی پاداش در نظر می‌گرفتند، اما معلوم شد که این فرضیه بیش از حد ساده است. یک چالش کلیدی برای فرضیه پاداش زمانی به وجود آمد که محققین تشخیص دادند که فعال شدن نورون‌های دوپامینرژیک به اندازه پاداش به خودی خود مرتبط نیست، بلکه بیشتر با انتظار پاداش مرتبط است (شولتز، ۱۹۹۸). به طور خاص، برای مقدار معینی از پاداش، فعالیت نورون‌های دوپامینرژیک زمانی که آن پاداش غیرمنتظره باشد در مقایسه با زمانی که انتظار می‌رود، بسیار بیشتر است. این مشاهدات منجر به دیدگاه جدیدی از نقش دوپامین در تقویت و تصمیم گیری شد.

DOPAMINE AND PREDICTION ERROR

دوپامین و خطای پیش بینی

We know from experience that the value of an item can change. Your favorite fishing hole may no longer be a favorite with the fish. After a couple of unsuccessful visits, you update your value (now that fishing hole is not your favorite either) and you look for a new spot. How do we learn and update the values associated with different stimuli and actions? An updating process is essential, because the environment may change. Updating is also essential because our own preferences change over time. Think about that root beer float. Would you be so eager to drink one if you had just downed a couple of ice cream cones?

ما به تجربه می‌دانیم که ارزش یک کالا می‌تواند تغییر کند. سوراخ ماهیگیری مورد علاقه شما ممکن است دیگر مورد علاقه ماهی نباشد. پس از چند بازدید ناموفق، ارزش خود را به روز می‌کنید (اکنون آن سوراخ ماهیگیری نیز مورد علاقه شما نیست) و به دنبال یک نقطه جدید می‌گردید. چگونه ارزش‌های مرتبط با محرک‌ها و اعمال مختلف را یاد بگیریم و به روز کنیم؟ یک فرآیند به روز رسانی ضروری است، زیرا ممکن است محیط تغییر کند. به روز رسانی نیز ضروری است زیرا ترجیحات ما در طول زمان تغییر می‌کند. به آن شناور آبجو ریشه فکر کنید. آیا اگر چند تا بستنی قیفی را پایین بیاورید، آنقدر مشتاق نوشیدن یکی بودید؟

Wolfram Schultz (1998) and his colleagues conducted a series of revealing experiments using a simple Pavlovian conditioning task with monkeys (see Chapter 9). The animals were trained such that a light, the conditioned stimulus (CS), was followed after a few seconds by an unconditioned stimulus (US), a sip of juice. To study the role of dopamine, Schultz recorded from dopaminergic cells in the VTA. As expected, when the training procedure started, the cells showed a large burst of activity after the US was presented (Figure 12.16a). Such a response could be viewed as representing the reward. When the CS-US events were repeatedly presented, however, two interesting things occurred. First, the dopamine response to the juice, the US, decreased over time. Second, the cells started to fire when the light, the CS, was presented. That is, the dopamine response gradually shifted from the US to the CS (Figure 12.16b).

ولفرام شولتز (۱۹۹۸) و همکارانش یک سری آزمایش‌های آشکار را با استفاده از یک کار شرطی سازی ساده پاولویی با میمون‌ها انجام دادند (به فصل ۹ مراجعه کنید). حیوانات به گونه ای آموزش دیدند که نور، محرک شرطی (CS)، پس از چند ثانیه توسط یک محرک غیرشرطی (US)، یک جرعه آب میوه دنبال شد. برای مطالعه نقش دوپامین، شولتز از سلول‌های دوپامینرژیک در VTA ثبت کرد. همانطور که انتظار می‌رفت، هنگامی‌که روش آموزشی شروع شد، سلول‌ها پس از ارائه ایالات متحده، یک انفجار بزرگ از فعالیت را نشان دادند (شکل ۱۲.16a). چنین پاسخی می‌تواند به عنوان نشان دهنده پاداش در نظر گرفته شود. با این حال، زمانی که رویدادهای CS-US مکرراً ارائه شد، دو چیز جالب رخ داد. اول، پاسخ دوپامین به آب میوه، ایالات متحده، در طول زمان کاهش یافت. دوم، سلول‌ها شروع به شلیک کردند که نور، CS، ارائه شد. یعنی پاسخ دوپامین به تدریج از ایالات متحده به CS تغییر کرد (شکل ۱۲.16b).

A reinforcement account of the reduced response to the US might emphasize that the value of the reward drops over time as the animal feels less hungry. Still, this hypothesis could not account for why the CS now triggers a dopamine response. The response here seems to suggest that the CS has now become rewarding.

یک گزارش تقویتی از کاهش پاسخ به ایالات متحده ممکن است تأکید کند که ارزش پاداش با گذشت زمان کاهش می‌یابد زیرا حیوان کمتر احساس گرسنگی می‌کند. با این حال، این فرضیه نمی‌تواند توضیح دهد که چرا CS اکنون پاسخ دوپامین را ایجاد می‌کند. به نظر می‌رسد پاسخ در اینجا نشان می‌دهد که CS اکنون پاداش دهنده شده است.

شکل 12.16 نورون‌های دوپامین به یک خطا در پیش بینی پاسخ می‌دهند

FIGURE 12.16 Dopamine neurons respond to an error in prediction. These raster plots show spikes in a midbrain dopamine neuron on single trials, with the data across trials summarized in the histograms at the top of each panel. (a) In the absence of a conditioned stimulus (CS), the DA neuron shows a burst of activity when the unpredicted reward (R), a drop of juice, is given. (b) When the CS is repeatedly paired with the reward, the DA neuron shows a temporal shift, now firing when the CS is presented because this is the unexpected, positive event. (c) On trials in which the predicted reward is not given, the neuron shows a positive prediction error after the CS (as in b) and a negative prediction error around the time of expected reward.

شکل ۱۲.۱۶ نورون‌های دوپامین به یک خطا در پیش بینی پاسخ می‌دهند. این نمودارهای شطرنجی، نوک‌هایی را در یک نورون دوپامین مغز میانی در آزمایش‌های منفرد نشان می‌دهند، با داده‌های سراسر کارآزمایی‌ها در هیستوگرام‌های بالای هر پانل خلاصه می‌شود. (الف) در غیاب یک محرک شرطی (CS)، نورون DA هنگامی‌که پاداش پیش بینی نشده (R)، یک قطره آب، داده می‌شود، انفجاری از فعالیت را نشان می‌دهد. (ب) هنگامی‌که CS به طور مکرر با پاداش جفت می‌شود، نورون DA یک تغییر زمانی نشان می‌دهد، اکنون هنگامی‌که CS ارائه می‌شود شلیک می‌شود زیرا این رویداد غیر منتظره و مثبت است. (ج) در آزمایش‌هایی که پاداش پیش‌بینی‌شده داده نمی‌شود، نورون یک خطای پیش‌بینی مثبت بعد از CS (مانند b) و یک خطای پیش‌بینی منفی در حول و حوش زمان پاداش مورد انتظار نشان می‌دهد.

Schultz proposed a new hypothesis to account for the role of dopamine in reward-based learning. Rather than thinking of the spike in DA neuron activity as representing the reward, he suggested it should be viewed as a reward prediction error (RPE), a signal that represents the difference between the obtained reward and the expected reward. First, consider the reduction in the dopaminergic response to the juice. On the first trial, the animal has not learned that the light is always followed by the juice. Thus, the animal does not expect to receive a reward following the light, but a reward is given. This event results in a positive RPE, because the obtained reward is greater than the expected reward: DA is released. With repeated presentation of the light-juice pairing, however, the animal comes to expect a reward when the light is presented. As the expected and obtained values become more similar, the size of the positive RPE is reduced and the dopaminergic response becomes attenuated.

شولتز یک فرضیه جدید برای توضیح نقش دوپامین در یادگیری مبتنی بر پاداش ارائه کرد. به‌جای اینکه اوج فعالیت نورون DA را نشان‌دهنده پاداش بداند، پیشنهاد کرد که باید به عنوان یک خطای پیش‌بینی پاداش (RPE) در نظر گرفته شود، سیگنالی که نشان‌دهنده تفاوت بین پاداش به‌دست‌آمده و پاداش مورد انتظار است. ابتدا، کاهش پاسخ دوپامینرژیک به آب میوه را در نظر بگیرید. در اولین آزمایش، حیوان یاد نگرفته است که نور همیشه توسط آب میوه دنبال می‌شود. بنابراین، حیوان انتظار ندارد به دنبال نور پاداشی دریافت کند، بلکه پاداشی داده می‌شود. این رویداد منجر به RPE مثبت می‌شود، زیرا پاداش به دست آمده بیشتر از پاداش مورد انتظار است: DA آزاد می‌شود. با این حال، با ارائه مکرر جفت نور-آبمیوه، حیوان با ارائه نور انتظار پاداش دارد. همانطور که مقادیر مورد انتظار و بدست آمده شبیه تر می‌شوند، اندازه RPE مثبت کاهش می‌یابد و پاسخ دوپامینرژیک ضعیف می‌شود.

Now consider the increase in the dopaminergic response to the light. When the animal is sitting in the test apparatus between trials, it has no expectancy of reward and does not associate the light with a reward. Thus, when the light flashes, the animal has no expectation (it is just hanging out), so there is no RPE. Expectation is low, and the reward is associated with the juice (yippee!), not the light (Figure 12.16a). As the animal experiences rewards after a light flash, however, it begins to associate the light with the juice, and the onset of the light results in a positive RPE (Figure 12.16b). This positive RPE is represented by the dopaminergic response to the light.

اکنون افزایش پاسخ دوپامینرژیک به نور را در نظر بگیرید. هنگامی‌که حیوان در بین آزمایشات در دستگاه آزمایش می‌نشیند، هیچ امیدی به پاداش ندارد و نور را با پاداش مرتبط نمی‌کند. بنابراین، هنگامی‌که نور چشمک می‌زند، حیوان هیچ انتظاری ندارد (او فقط در حال آویزان است)، بنابراین RPE وجود ندارد. انتظارات کم است و پاداش با آب میوه (ایپی!) مرتبط است، نه نور (شکل ۱۲.16a). از آنجایی که حیوان پس از یک فلاش نور پاداش را تجربه می‌کند، با این حال، شروع به مرتبط کردن نور با شیره می‌کند و شروع نور منجر به RPE مثبت می‌شود (شکل ۱۲.16b). این RPE مثبت با پاسخ دوپامینرژیک به نور نشان داده می‌شود.

To calculate an RPE, a neuron must have two inputs: one corresponding to the predicted reward and one indicating the actual reward. Naoshige Uchida’s lab has been investigating this problem, asking whether DA neurons actually do the calculation of reward prediction errors or if this information is instead calculated upstream and then passed on to the DA neurons.

برای محاسبه RPE، یک نورون باید دو ورودی داشته باشد: یکی مربوط به پاداش پیش‌بینی‌شده و دیگری نشان‌دهنده پاداش واقعی. آزمایشگاه نائوشیگه اوچیدا در حال بررسی این مشکل بوده است و می‌پرسد که آیا نورون‌های DA واقعاً خطاهای پیش‌بینی پاداش را محاسبه می‌کنند یا اینکه این اطلاعات در عوض در بالادست محاسبه شده و سپس به نورون‌های DA منتقل می‌شوند.

In an ingenious series of experiments, these researchers provided evidence in favor of the former hypothesis. Their first step was to determine whether DA neurons received input of the actual reward. To answer this ques- tion, they injected special retroactive tracers into the VTA, which were taken up by axons terminating on DA neurons. This procedure enabled the researchers to identify all of the inputs to the DA neurons. From a combination of cellular recording and optogenetics (discussed in Chapter 3), they characterized the inputs from a broadly distributed set of subcortical areas, such as the hypo- thalamus (Tian & Uchida, 2015). Some of these had the signature of a reward signal; namely, the activity level scaled with the amount of reward.

این محققان در یک سری آزمایش‌های مبتکرانه، شواهدی را به نفع فرضیه قبلی ارائه کردند. اولین قدم آنها تعیین اینکه آیا نورون‌های DA ورودی پاداش واقعی را دریافت کرده اند یا خیر بود. برای پاسخ به این سوال، آنها ردیاب‌های پس‌رونده خاصی را به VTA تزریق کردند که توسط آکسون‌هایی که به نورون‌های DA ختم می‌شوند، جذب شدند. این روش محققان را قادر می‌سازد تا تمام ورودی‌های نورون‌های DA را شناسایی کنند. از ترکیبی از ثبت سلولی و اپتوژنتیک (در فصل ۳ بحث شد)، آنها ورودی‌های یک مجموعه گسترده از نواحی زیر قشری مانند هیپوتالاموس را مشخص کردند (تیان و اوچیدا، ۲۰۱۵). برخی از آنها دارای امضای علامت پاداش بودند. یعنی، سطح فعالیت با مقدار پاداش مقیاس شده است.

These researchers then examined the neuronal mechanisms of reward predictions, focusing on the input to DA neurons from neighboring GABA neurons (Eshel et al., 2015). They developed an optogenetic label in mice that enabled them to specifically control the activity of these GABA neurons while simultaneously recording from DA neurons. The mice were then trained on a task in which an odor signaled the likelihood of reward: Odor A had a 10% chance of reward, and Odor B had a 90% chance of reward.

این محققان سپس مکانیسم‌های عصبی پیش‌بینی پاداش را با تمرکز بر ورودی نورون‌های DA از نورون‌های GABA همسایه بررسی کردند (Eshel et al., 2015). آنها یک برچسب اپتوژنتیک را در موش‌ها ایجاد کردند که آنها را قادر می‌ساخت تا به طور خاص فعالیت این نورون‌های GABA را کنترل کنند و همزمان از نورون‌های DA ثبت کنند. سپس موش‌ها برای انجام کاری آموزش دیدند که در آن بو، احتمال پاداش را نشان می‌دهد: بوی A 10% شانس پاداش داشت و بوی B 90% شانس پاداش داشت.

First let’s consider the response of the GABA and DA neurons in the absence of optogenetic stimulation (Figure 12.17a). At the time of the actual reward, the DA neurons have a much stronger response to Odor A than to Odor B: Because the reward is not expected with Odor A, there is a stronger positive RPE. The reduced DA response to Odor A initially is associated with an increase in the firing rate of the inhibitory GABA inter- neurons (not shown) at the onset of the odor-an effect that persists until the reward delivery.

ابتدا بیایید پاسخ نورون‌های GABA و DA را در غیاب تحریک اپتوژنتیک در نظر بگیریم (شکل ۱۲.17a). در زمان پاداش واقعی، نورون‌های DA پاسخ بسیار قوی‌تری به بوی A نسبت به بوی B دارند: از آنجایی که پاداش با بوی A انتظار نمی‌رود، RPE مثبت قوی‌تری وجود دارد. کاهش پاسخ DA به بوی A در ابتدا با افزایش سرعت شلیک بین نورون‌های بازدارنده GABA (نشان داده نشده) در شروع بو مرتبط است – اثری که تا زمان تحویل پاداش ادامه دارد.

شکل 12.17 ورودی گابا به نورون‌های دوپامین سیگنال عصبی پیش بینی پاداش را ارائه می‌دهد

FIGURE 12.17 GABA Input to dopamine neurons provides a neuronal signal of reward prediction. (a) Firing rate of DA neurons to onset of cue (time = 0 s) and juice reward (dashed line). Reward occurs on 10% of the trials for Odor A and 90% of the trials for Odor B. Because Odor B has a higher reward prediction, the positive RPE response of the DA neuron is lower to Odor B at the time of the juice reward. (b) Activity of GABA neurons to Odor B either without optogenetic stimulation (blue) or with optogenetic stimulation (orange). The green arrow indicates onset of stimulation. The results confirm that the optogenetic tag was effective in decreasing activity in GABA neurons. (c) Optogenetic inhibition of the GABA neurons led to an increase in the DA response to Odor B, confirming that the GABA input to DA neurons can signal a predicted reward.

شکل ۱۲.۱۷ ورودی گابا به نورون‌های دوپامین سیگنال عصبی پیش بینی پاداش را ارائه می‌دهد. (الف) سرعت شلیک نورون‌های DA تا شروع نشانه (زمان = 0 ثانیه) و پاداش آب (خط چین). پاداش در ۱۰ درصد آزمایش‌ها برای بوی A و ۹۰ درصد آزمایش‌ها برای بوی B رخ می‌دهد. از آنجایی که بوی B پیش‌بینی پاداش بالاتری دارد، پاسخ مثبت RPE نورون DA در زمان پاداش آب میوه کمتر از بوی B است. . (ب) فعالیت نورون‌های GABA به بوی B یا بدون تحریک اپتوژنتیک (آبی) یا با تحریک اپتوژنتیک (نارنجی). فلش سبز نشان دهنده شروع تحریک است. نتایج تایید می‌کنند که برچسب اپتوژنتیک در کاهش فعالیت نورون‌های GABA مؤثر بوده است. (ج) مهار اپتوژنتیکی نورون‌های GABA منجر به افزایش پاسخ DA به بوی B شد و تأیید می‌کند که ورودی GABA به نورون‌های DA می‌تواند یک پاداش پیش‌بینی‌شده را نشان دهد.

Now consider what happens on trials in which Odor B is presented along with the optogenetic silencing of the GABA neurons. Confirming that the manipulation is successful, the GABA neurons’ firing rate drops off as soon as the light is turned on (green region in Figure 12.17b). More interesting, removing this inhibitory input to the DA neurons produces an increase in the response of the DA neuron (Figure 12.17c). In contrast, when the GABA neurons are inhibited, the DA response to Odor A is minimally affected (not shown).

حال در نظر بگیرید که چه اتفاقی در آزمایش‌هایی می‌افتد که در آن بوی B همراه با خاموش کردن نورون‌های GABA ارائه می‌شود. با تأیید موفقیت آمیز بودن دستکاری، سرعت شلیک نورون‌های GABA به محض روشن شدن نور کاهش می‌یابد (منطقه سبز در شکل ۱۲.17b). جالب تر، حذف این ورودی بازدارنده به نورون‌های DA باعث افزایش پاسخ نورون DA می‌شود (شکل ۱۲.17c). در مقابل، زمانی که نورون‌های GABA مهار می‌شوند، پاسخ DA به بوی A حداقل تحت تأثیر قرار می‌گیرد (نشان داده نمی‌شود).

These results indicate that GABA neurons provide a signal of reward expectancy to DA neurons. In combination with the evidence showing that the DA neurons receive inputs about the actual reward, we see that the DA neurons are ideally positioned to calculate reward prediction errors.

این نتایج نشان می‌دهد که نورون‌های GABA سیگنال امید به پاداش را به نورون‌های DA ارائه می‌دهند. در ترکیب با شواهدی که نشان می‌دهد نورون‌های DA ورودی‌های مربوط به پاداش واقعی را دریافت می‌کنند، می‌بینیم که نورون‌های DA برای محاسبه خطاهای پیش‌بینی پاداش در موقعیت ایده‌آلی قرار دارند.

The prediction error model has proved to be an important idea for thinking about how dopamine is related to both reinforcement and learning. We have described the case in which the obtained reward is greater than the expected reward, resulting in a positive RPE. We can also consider situations with negative RPES, cases in which the obtained reward is less than the expected reward. This situation happens during a trial when the experimenter meanly withholds the reward (the juice, returning to the example of Figure 12.16) after presenting the CS (the light). Now there is a dip in the response of the DA neuron around the time when the juice was expected (Figure 12.16c).

ثابت شده است که مدل خطای پیش‌بینی ایده مهمی‌برای تفکر در مورد ارتباط دوپامین با تقویت و یادگیری است. ما موردی را شرح داده‌ایم که در آن پاداش به‌دست‌آمده بیشتر از پاداش مورد انتظار است که منجر به RPE مثبت می‌شود. همچنین می‌توانیم موقعیت‌هایی را با RPES منفی در نظر بگیریم، مواردی که در آن پاداش به‌دست‌آمده کمتر از پاداش مورد انتظار است. این وضعیت در طول یک آزمایش زمانی اتفاق می‌افتد که آزمایش‌کننده به طور معنی‌داری از پاداش (آب میوه، بازگشت به مثال شکل ۱۲.۱۶) پس از ارائه CS (نور) خودداری می‌کند. اکنون افتی در پاسخ نورون DA در زمانی که آب میوه مورد انتظار بود وجود دارد (شکل ۱۲.16c).

This negative RPE occurs because the animal is expecting the juice, but none is obtained. If the juice is repeatedly withheld, the size of both the increase in the dopaminergic response to the light and the decrease in the dopaminergic response to the absence of the juice are reduced. This situation corresponds to the phenomenon of extinction, in which a response previously associated with a stimulus is no longer produced. With enough trials, the DA neurons show no change in baseline firing rates. The light is no longer reinforcing (so the positive RPE to the light is extinguished), and the absence of the juice is no longer a violation of an expectancy (so the negative RPE when the juice was anticipated is also abolished).

این RPE منفی به این دلیل رخ می‌دهد که حیوان منتظر آب میوه است، اما هیچ کدام به دست نمی‌آید. اگر آب میوه به طور مکرر متوقف شود، اندازه افزایش پاسخ دوپامینرژیک به نور و کاهش پاسخ دوپامینرژیک به عدم وجود آب میوه کاهش می‌یابد. این وضعیت با پدیده انقراض مطابقت دارد که در آن پاسخی که قبلاً با یک محرک مرتبط بود دیگر تولید نمی‌شود. با آزمایش‌های کافی، نورون‌های DA هیچ تغییری در نرخ شلیک پایه نشان نمی‌دهند. نور دیگر تقویت کننده نیست (بنابراین RPE مثبت به نور خاموش می‌شود) و عدم وجود آب میوه دیگر نقض انتظار نیست (بنابراین RPE منفی زمانی که آب میوه پیش بینی شده بود نیز لغو می‌شود).

As we have seen in this example, the dopaminergic response changes with learning. Indeed, scientists have recognized that the prediction error signal itself can be useful for reinforcement learning, serving as a teaching signal. As discussed earlier, models of decision making assume that events in the world (or internal states) have associated values. Juice is a valued commodity, especially to a thirsty monkey. Over time, the light also becomes a valued stimulus, signaling the upcoming reward. The RPE signal can be used to update representations of value. Computationally, this process can be described as taking the current value representation and multiplying it by some weighted factor (gain) of the RPE (Dayan & Niv, 2008). If the RPE is positive, the net result is an increase in value. If the RPE is negative, the net result is a decrease in value.

همانطور که در این مثال دیدیم، پاسخ دوپامینرژیک با یادگیری تغییر می‌کند. در واقع، دانشمندان دریافته‌اند که سیگنال خطای پیش‌بینی خود می‌تواند برای یادگیری تقویتی مفید باشد و به عنوان یک سیگنال آموزشی عمل کند. همانطور که قبلاً بحث شد، مدل‌های تصمیم‌گیری فرض می‌کنند که رویدادهای جهان (یا وضعیت‌های داخلی) دارای ارزش‌های مرتبط هستند. آب میوه یک کالای با ارزش است، به ویژه برای یک میمون تشنه. با گذشت زمان، نور همچنین به یک محرک ارزشمند تبدیل می‌شود که نشان دهنده پاداش آینده است. سیگنال RPE می‌تواند برای به روز رسانی نمایش‌های ارزش استفاده شود. از نظر محاسباتی، این فرآیند را می‌توان به صورت گرفتن نمایش مقدار فعلی و ضرب آن در مقداری ضریب وزنی (بهره) RPE توصیف کرد (Dayan & Niv، ۲۰۰۸). اگر RPE مثبت باشد، نتیجه خالص افزایش ارزش است. اگر RPE منفی باشد، نتیجه خالص کاهش ارزش است.

This elegant yet simple model not only predicts how values are updated, but also accounts for changes in the amount that is learned from one trial to the next. Early in training, the value of the light is low. The large RPE that occurs when it is followed by the juice will lead to an increase in the value associated with the light. With repeated trials, though, the size of the RPE decreases, so subsequent changes in the value of the light will also increase more slowly.

این مدل زیبا و در عین حال ساده نه تنها نحوه به‌روزرسانی مقادیر را پیش‌بینی می‌کند، بلکه تغییرات در مقداری را که از یک آزمایش به آزمایش دیگر یاد می‌گیرید را نیز در نظر می‌گیرد. در اوایل تمرین، ارزش نور کم است. RPE بزرگی که زمانی رخ می‌دهد که آب میوه به دنبال آن باشد، منجر به افزایش مقدار مرتبط با نور می‌شود. اگرچه با آزمایش‌های مکرر، اندازه RPE کاهش می‌یابد، بنابراین تغییرات بعدی در مقدار نور نیز آهسته‌تر افزایش می‌یابد.

This process, in which learning is initially rapid and then occurs in much smaller increments over time, is characteristic of almost all learning functions. Although this effect might occur for many reasons (e.g., the benefits of practice diminish over time), the impressive thing is that it is predicted by a simple model in which value representations are updated by a simple mechanism based on the difference between the predicted and obtained reward.

این فرآیند، که در آن یادگیری در ابتدا سریع است و سپس با افزایش‌های بسیار کوچکتر در طول زمان رخ می‌دهد، تقریباً برای همه عملکردهای یادگیری مشخص است. اگرچه این اثر ممکن است به دلایل زیادی رخ دهد (به عنوان مثال، مزایای تمرین در طول زمان کاهش می‌یابد)، اما نکته قابل توجه این است که توسط یک مدل ساده پیش‌بینی می‌شود که در آن نمایش‌های ارزش با یک مکانیسم ساده بر اساس تفاوت بین پیش‌بینی‌شده و پیش‌بینی‌شده به‌روزرسانی می‌شوند. پاداش به دست آورد

REWARD AND PUNISHMENT

پاداش و جزا

Not all options are rewarding; just consider your dog’s response after he has tried to nudge a porcupine out of a rotting tree. Talk about prediction error! Are positive and negative reinforcers treated by the same or different systems? Although it may seem like it, punishment is not the withholding of a reward. Whereas the absence of an expected reward is coded by negative prediction errors, punishment involves the experience of something aversive, like a shock or a nose full of porcupine quills. Aversive events are the opposite of rewarding events in that they are unpleasant and motivate one to avoid them in the future.

همه گزینه‌ها سودمند نیستند. فقط پاسخ سگ خود را بعد از اینکه سعی کرد جوجه تیغی را از درخت پوسیده بیرون بکشد را در نظر بگیرید. در مورد خطای پیش بینی صحبت کنید! آیا تقویت کننده‌های مثبت و منفی توسط سیستم‌های مشابه یا متفاوت درمان می‌شوند؟ اگر چه ممکن است به نظر برسد، مجازات، خودداری از پاداش نیست. در حالی که فقدان پاداش مورد انتظار با خطاهای پیش‌بینی منفی رمزگذاری می‌شود، تنبیه شامل تجربه چیزی بد، مانند شوک یا بینی پر از خارپشت است. رویدادهای منفور متضاد رویدادهای پاداش دهنده هستند، زیرا ناخوشایند هستند و فرد را تشویق می‌کنند تا در آینده از آنها اجتناب کند.

In one important respect, however, reinforcement and punishment are similar: They are both motivationally salient the kinds of events that draw our attention and engage control processes to influence behavior. The role of dopamine in aversive events has been difficult to pin down. Some studies show increases in dopamine activity, others find decreases, and some find both within the same study. Can these findings be reconciled?

با این حال، از یک جنبه مهم، تقویت و تنبیه شبیه به هم هستند: هر دو از نظر انگیزشی برجسته هستند انواع رویدادهایی که توجه ما را جلب می‌کنند و فرآیندهای کنترلی را برای تأثیرگذاری بر رفتار درگیر می‌کنند. تعیین نقش دوپامین در حوادث ناخوشایند دشوار است. برخی مطالعات افزایش فعالیت دوپامین را نشان می‌دهند، برخی دیگر کاهش می‌یابند و برخی هر دو را در یک مطالعه پیدا می‌کنند. آیا این یافته‌ها قابل تطبیق هستند؟

The habenula, a structure located within the dorsal thalamus, is in a good position to represent emotional and motivational events because it receives inputs from the forebrain limbic regions and sends inhibitory projections to dopamine neurons in the substantia nigra pars compacta. Masayuki Matsumoto and Okihide Hikosaka (2007) recorded from neurons in the lateral habenula and dopaminergic neurons in the SN, while monkeys made saccadic eye movements to a target that was either to the left or to the right of a fixation point. A saccade to one target was associated with a juice reward, and a saccade to the other target resulted in non-reinforcement.
Habenula neurons became active when the saccade was to the no-reward side and were suppressed if the saccade was to the reward side. DA neurons showed the opposite profile: They were excited by the reward- predicting targets and suppressed by the targets predicting no reward. Even weak electrical stimulation of the habenula elicited strong inhibition in DA neurons, suggesting that reward-related activity of the DA neurons may be regulated by input from the lateral habenula.

هابنولا، ساختاری که در تالاموس پشتی قرار دارد، در موقعیت خوبی برای نمایش رویدادهای هیجانی و انگیزشی قرار دارد، زیرا ورودی‌های نواحی لیمبیک پیش‌مغز را دریافت می‌کند و برجستگی‌های مهاری را به نورون‌های دوپامین در جسم سیاه پارس فشرده ارسال می‌کند. Masayuki Matsumoto و Okihide Hikosaka (2007) از نورون‌ها در‌هابنولا جانبی و نورون‌های دوپامینرژیک در SN ثبت کردند، در حالی که میمون‌ها حرکات ساکادیک چشم را به سمت هدفی انجام دادند که در سمت چپ یا راست نقطه تثبیت بود. ساکاد به یک هدف با پاداش آب همراه بود و ساکاد به هدف دیگر منجر به عدم تقویت شد.
نورون‌های‌هابنولا زمانی فعال شدند که ساکاد در سمت بدون پاداش بود و اگر ساکاد به سمت پاداش بود سرکوب می‌شدند. نورون‌های DA نمایه مخالفی را نشان دادند: آنها توسط اهداف پیش بینی کننده پاداش هیجان زده شدند و توسط اهدافی که هیچ پاداشی پیش بینی نمی‌کردند سرکوب شدند. حتی تحریک الکتریکی ضعیف‌هابنولا باعث مهار قوی در نورون‌های DA شد، که نشان می‌دهد فعالیت‌های مرتبط با پاداش نورون‌های DA ممکن است با ورودی از‌هابنولا جانبی تنظیم شود.

Value is in one sense relative. If given a 50-50 chance to win $100 or $10, we would be disappointed to get only $10. If the game were changed, however, so that we stood to win either $10 or $1, we’d be thrilled to get the $10. Habenula neurons show a similar context dependency. If two actions result in either juice or nothing, the habenula is active when the nothing choice is made. But if the two actions result in either nothing or an aversive puff of air to the eye, the habenula is active only when the animal makes the response that results in the puff.

ارزش از یک جهت نسبی است. اگر یک شانس ۵۰-۵۰ برای بردن ۱۰۰ یا ۱۰ دلار به ما داده شود، از دریافت تنها ۱۰ دلار ناامید خواهیم شد. با این حال، اگر بازی تغییر می‌کرد، به طوری که ما می‌توانستیم ۱۰ دلار یا ۱ دلار برنده شویم، از دریافت ۱۰ دلار هیجان زده می‌شدیم. نورون‌های‌هابنولا وابستگی بافتی مشابهی را نشان می‌دهند. اگر دو عمل منجر به آب میوه شود یا هیچ،‌هابنولا زمانی فعال می‌شود که هیچ انتخابی انجام نشود. اما اگر این دو عمل منجر به هیچ یا یک پف بد هوا برای چشم شود،‌هابنولا تنها زمانی فعال است که حیوان پاسخی را بدهد که منجر به پف می‌شود.

This context dependency is also seen in DA responses. In our hypothetical game, we might imagine that the expected reward in the first pairing is $55 (the average of $100 and $10), whereas in the second pairing it is only $5.50. The $10 outcome results in a positive RPE in one case and a negative RPE in the other. In sum, there are many computational similarities between how we respond to rewards and punishments, and this finding may reflect the interaction between the habenula and the dopamine system.

این وابستگی زمینه نیز در پاسخ‌های DA دیده می‌شود. در بازی فرضی ما، ممکن است تصور کنیم که پاداش مورد انتظار در جفت اول ۵۵ دلار است (میانگین ۱۰۰ دلار و ۱۰ دلار)، در حالی که در جفت دوم فقط ۵.۵۰ دلار است. نتیجه ۱۰ دلاری منجر به RPE مثبت در یک مورد و RPE منفی در مورد دیگر می‌شود. در مجموع، شباهت‌های محاسباتی زیادی بین نحوه واکنش ما به پاداش‌ها و مجازات‌ها وجود دارد، و این یافته ممکن است منعکس کننده تعامل بین‌هابنولا و سیستم دوپامین باشد.

In general, fMRI studies lack the spatial resolution to measure activity in small brainstem regions such as the VTA or lateral habenula. Nonetheless, researchers can ask similar questions about the similarity of neural regions in coding positive and negative outcomes. In one study, Ben Seymour and his colleagues (2007) paired different cues with possible financial outcomes that signaled a gain versus nothing, a loss versus nothing, or a gain versus a loss (Figure 12.18a). Study participants did not make choices in this experiment; they simply viewed the choices, and the computer determined the outcome. Positive and negative RPES of gains and losses were both correlated with activity in the ventral striatum, but the specific ventral striatal region differed for the two conditions. Gains were encoded in the more anterior regions, and losses in the more posterior regions (Figure 12.18b). A region in the insula also responded to prediction error, but only when the choice resulted in a loss.

به طور کلی، مطالعات fMRI فاقد وضوح فضایی برای اندازه گیری فعالیت در مناطق کوچک ساقه مغز مانند VTA یا‌هابنولا جانبی هستند. با این وجود، محققان می‌توانند سوالات مشابهی در مورد شباهت مناطق عصبی در کدگذاری پیامدهای مثبت و منفی بپرسند. در یک مطالعه، بن سیمور و همکارانش (۲۰۰۷) نشانه‌های متفاوتی را با نتایج مالی احتمالی جفت کردند که نشان دهنده سود در مقابل هیچ، ضرر در مقابل هیچ، یا سود در مقابل ضرر بود (شکل ۱۲.18a). شرکت کنندگان مطالعه در این آزمایش انتخابی انجام ندادند. آنها به سادگی انتخاب‌ها را مشاهده کردند و کامپیوتر نتیجه را تعیین کرد. RPES مثبت و منفی سود و زیان هر دو با فعالیت در جسم مخطط شکمی‌در ارتباط بودند، اما ناحیه مخطط شکمی‌خاص برای این دو شرایط متفاوت بود. سودها در نواحی قدامی‌تر و ضررها در نواحی خلفی تر کدگذاری شدند (شکل ۱۲.18b). منطقه ای در جزیره نیز به خطای پیش بینی پاسخ داد، اما تنها زمانی که انتخاب منجر به ضرر شود.

Alternative Views of Dopamine Activity

دیدگاه‌های جایگزین از فعالیت دوپامین

The RPE story elegantly accounts for the role of dopaminergic cells in reinforcement and learning, but there remain viable alternative hypotheses. Kent Berridge (2007) argues that dopamine release is the result, not the cause, of learning. He points out a couple of problems with the notion that dopamine acts as a learning signal. First, mice that are genetically unable to synthesize dopamine can still learn (Cannon & Bseikri, 2004; Cannon & Palmiter, 2003). Second, genetically mutant mice with high dopamine levels do not learn any faster, nor do they maintain habits longer, than mice with normal levels of dopamine.

داستان RPE به زیبایی نقش سلول‌های دوپامینرژیک را در تقویت و یادگیری به حساب می‌آورد، اما فرضیه‌های جایگزین قابل قبولی وجود دارد. کنت بریج (۲۰۰۷) استدلال می‌کند که آزاد شدن دوپامین نتیجه یادگیری است نه علت. او به چند مشکل با این تصور اشاره می‌کند که دوپامین به عنوان یک سیگنال یادگیری عمل می‌کند. اول، موش‌هایی که از نظر ژنتیکی قادر به سنتز دوپامین نیستند، هنوز هم می‌توانند یاد بگیرند (کانن و بسیکری، ۲۰۰۴؛ کانن و پالمیتر، ۲۰۰۳). دوم، موش‌های جهش یافته ژنتیکی با سطوح دوپامین بالا نسبت به موش‌های با سطوح طبیعی دوپامین سریع‌تر یاد نمی‌گیرند و عادت‌هایشان را برای مدت طولانی‌تری حفظ نمی‌کنند.

شکل 12.18 کدگذاری سود و زیان در جسم مخطط شکمی با fMRI قسمت اول

FIGURE 12.18 Coding of gain and loss in the ventral striatum with fMRI.
(a) People were presented with one of four cues: A, B, C, or D. Over time, they learned that each cue was associated with one of two possible outcomes (or, for Cue A, the same neutral outcome). (b) Prediction errors reliably predicted the BOLD response in the ventral striatum, with the center of the positive RPE response (green) slightly anterior to the center of the negative RPE response (red).

شکل ۱۲.۱۸ کدگذاری سود و زیان در جسم مخطط شکمی با fMRI.
(الف) به افراد یکی از چهار نشانه ارائه شد: A، B، C، یا D. با گذشت زمان، آنها یاد گرفتند که هر نشانه با یکی از دو پیامد ممکن (یا برای نشانه A، همان نتیجه خنثی) مرتبط است. (ب) خطاهای پیش بینی به طور قابل اعتمادی پاسخ BOLD را در جسم مخطط شکمی‌با مرکز پاسخ مثبت RPE (سبز) کمی‌جلوتر از مرکز پاسخ منفی RPE (قرمز) پیش بینی کردند.

Given these puzzles, Berridge suggests that dopamine neurons do not cause learning by encoding RPEs. Instead, they code the informational consequences of prediction and learning (generated elsewhere in the brain) and then do something with the information. He proposes that dopamine activity is indicative of the salience of a stimulus or an event.

با توجه به این معماها، بریج پیشنهاد می‌کند که نورون‌های دوپامین با رمزگذاری RPE باعث یادگیری نمی‌شوند. در عوض، پیامدهای اطلاعاتی پیش‌بینی و یادگیری (که در جای دیگری از مغز ایجاد می‌شود) را کدگذاری می‌کنند و سپس کاری را با اطلاعات انجام می‌دهند. او پیشنهاد می‌کند که فعالیت دوپامین نشان دهنده برجسته بودن یک محرک یا یک رویداد است.

Berridge describes a reward as made up of three dissociable components: wanting, learning, and liking. His view is that dopamine mediates only the “wanting” component. Dopamine activity indicates that something is worth paying attention to, and when these things are associated with reward, the dopamine activity reflects how desirable the object is. The distinction between wanting and liking may seem subtle, but it can have serious implications when we consider things like drug abuse.

بریج یک پاداش را از سه جزء قابل تفکیک توصیف می‌کند: خواستن، یادگیری و دوست داشتن. نظر او این است که دوپامین فقط جزء “خواستن” را واسطه می‌کند. فعالیت دوپامین نشان می‌دهد که چیزی ارزش توجه دارد، و وقتی این چیزها با پاداش همراه است، فعالیت دوپامین نشان می‌دهد که آن شی چقدر مطلوب است. تمایز بین خواستن و دوست داشتن ممکن است ظریف به نظر برسد، اما وقتی مواردی مانند سوء مصرف مواد را در نظر می‌گیریم می‌تواند پیامدهای جدی داشته باشد.

In one experiment, cocaine users were given a drug that lowered their dopamine levels (Leyton et al., 2005). In the lowered dopamine state, cues indicating the availability of cocaine were rated as less desirable. However, the users’ feelings of euphoria in response to cocaine and their rate of self-administration were unaffected. That is, with reduced dopamine, study participants still liked cocaine in the same way (reinforcement was unchanged), even though they didn’t particularly want it.

در یک آزمایش، به مصرف کنندگان کوکائین دارویی داده شد که سطح دوپامین آنها را کاهش داد (لیتون و همکاران، ۲۰۰۵). در حالت کاهش دوپامین، نشانه‌هایی که در دسترس بودن کوکائین را نشان می‌دهند، کمتر مطلوب ارزیابی شدند. با این حال، احساس سرخوشی کاربران در پاسخ به کوکائین و میزان مصرف خود تحت تأثیر قرار نگرفت. یعنی با کاهش دوپامین، شرکت‌کنندگان در مطالعه همچنان کوکائین را به همان شیوه دوست داشتند (تقویت‌ها بدون تغییر بود)، حتی اگر به‌ویژه آن را نمی‌خواستند.

It is reasonable to suppose that dopamine serves multiple functions. Indeed, neurophysiologists have described two classes of responses when recording from DA neurons in the brainstem (Matsumoto & Hikosaka, 2009). One subset of DA neurons responded in terms of valence. These cells increased their firing rate to stimuli that were predictive of reward and decreased their firing rate to aversive stimuli (Figure 12.19a). A greater number of DA neurons, however, were excited by salience-the increased likelihood of any reinforcement, independent of whether it was a reward or a punishment, and especially when it was unpredictable (Figure 12.19b).

منطقی است که فرض کنیم دوپامین چندین عملکرد را انجام می‌دهد. در واقع، نوروفیزیولوژیست‌ها دو دسته از پاسخ‌ها را هنگام ثبت از نورون‌های DA در ساقه مغز توصیف کرده اند (Matsumoto & Hikosaka، ۲۰۰۹). یک زیر مجموعه از نورون‌های DA از نظر ظرفیت پاسخ دادند. این سلول‌ها سرعت شلیک خود را به محرک‌هایی افزایش دادند که پیش بینی کننده پاداش بودند و سرعت شلیک خود را به محرک‌های بد کاهش دادند (شکل ۱۲.19a). با این حال، تعداد بیشتری از نورون‌های DA با برجستگی هیجان‌زده شدند – احتمال افزایش هر گونه تقویت، مستقل از پاداش یا مجازات، و به‌ویژه زمانی که غیرقابل پیش‌بینی بود (شکل ۱۲.19b).

The first response class is similar to what would be expected of neurons coding prediction errors; the second, to what would be expected of neurons signaling things that require attention. Interestingly, the valence neurons were located more ventromedially in the substantia nigra and VTA, in areas that project to the ventral striatum and are part of a network involving orbitofrontal cortex. In contrast, the neurons excited by salience were located more dorsolaterally in the substantia nigra, in regions with projections to the dorsal striatum and a network of cortical areas associated with the control of action and orientation.

اولین کلاس پاسخ مشابه چیزی است که از نورون‌های کدگذاری خطاهای پیش بینی انتظار می‌رود. دوم، آنچه که از نورون‌هایی که به چیزهایی که نیاز به توجه دارند سیگنال می‌دهند انتظار می‌رود. جالب اینجاست که نورون‌های ظرفیت بیشتر به صورت شکمی‌در قسمت‌های سیاه و VTA، در مناطقی که به جسم مخطط شکمی‌پیش می‌روند و بخشی از شبکه‌ای شامل قشر اوربیتوفرونتال هستند، قرار داشتند. در مقابل، نورون‌های برانگیخته‌شده توسط برجستگی بیشتر به‌صورت پشتی جانبی در جسم سیاه، در مناطقی با برجستگی‌هایی به جسم مخطط پشتی و شبکه‌ای از نواحی قشری مرتبط با کنترل عمل و جهت‌گیری قرار داشتند.

We can see that when damage occurs within the dopamine system, or when downstream structures in the cortex are compromised, control problems will be reflected in behavioral changes related to motivation, learning, reward valuation, and emotion. These observations bring us back to how frontal lobe control systems are at work in both decision-making and goal-oriented behavior.

می‌توانیم ببینیم که وقتی آسیب در سیستم دوپامین رخ می‌دهد، یا زمانی که ساختارهای پایین‌دستی در قشر آسیب می‌بینند، مشکلات کنترلی در تغییرات رفتاری مرتبط با انگیزه، یادگیری، ارزش‌گذاری پاداش و احساسات منعکس می‌شوند. این مشاهدات ما را به نحوه عملکرد سیستم‌های کنترل لوب فرونتال هم در تصمیم گیری و هم در رفتار هدف گرا بازمی‌گرداند.

شکل 12.19 دو دسته از نورون‌های دوپامین قسمت اول

FIGURE 12.19 Two classes of dopamine neurons.
(a) Response profile of DA neurons that code valence. These neurons increase their firing rate as the probability of a positive outcome increases and decrease their firing rate as the probability of a negative outcome increases. (b) Response profile of DA neurons coding salience. These neurons increase their firing rate as reinforcement probability increases, independent of whether the reinforcement is positive or negative, signaling that the stimulus is important (or predictive).

شکل ۱۲.۱۹ دو دسته از نورون‌های دوپامین.
(الف) نمایه پاسخ نورون‌های DA که ظرفیت را کد می‌کنند. این نورون‌ها با افزایش احتمال یک نتیجه مثبت، سرعت شلیک خود را افزایش می‌دهند و با افزایش احتمال نتیجه منفی، سرعت شلیک خود را کاهش می‌دهند. (ب) مشخصات پاسخ نورون‌های DA که برجستگی را کد می‌کنند. این نورون‌ها با افزایش احتمال تقویت، نرخ شلیک خود را افزایش می‌دهند، مستقل از مثبت یا منفی بودن تقویت، که نشان دهنده مهم بودن (یا پیش‌بینی‌کننده) محرک است.

TAKE-HOME MESSAGES

پیام‌های کلیدی

▪️ A decision involves the selection of one option among several. It typically involves an evaluation of the expected outcome (reward) associated with each option.

▪️ یک تصمیم شامل انتخاب یک گزینه از میان چندین گزینه است. معمولاً شامل ارزیابی نتیجه مورد انتظار (پاداش) مرتبط با هر گزینه است.

▪️ The subjective value of an item is made up of multiple variables that include payoff amount, context, probability, effort/cost, temporal discounting, novelty, and preference.

▪️ ارزش ذهنی یک آیتم از متغیرهای متعددی تشکیل شده است که شامل مقدار بازده، زمینه، احتمال، تلاش/هزینه، تخفیف موقت، تازگی و ترجیح است.

▪️Single-cell recordings in monkeys and fMRI studies in humans have implicated frontal regions, including the orbitofrontal cortex, in value representation.

▪️ ثبت‌های تک سلولی در میمون‌ها و مطالعات fMRI در انسان، نواحی فرونتال، از جمله قشر اوربیتوفرونتال، را در بازنمایی ارزش نقش دارند.

▪️ Reward prediction error (RPE) is the difference between the expected reward and what is actually obtained. The RPE is used as a learning signal to update value information as expectancies and the valence of rewards change. The activity of some DA neurons provides a neuronal code of prediction errors.

▪️خطای پیش بینی پاداش (RPE) تفاوت بین پاداش مورد انتظار و چیزی است که واقعاً به دست آمده است. RPE به عنوان یک سیگنال یادگیری برای به روز رسانی اطلاعات ارزش با تغییر انتظارات و ظرفیت پاداش‌ها استفاده می‌شود. فعالیت برخی از نورون‌های DA یک کد عصبی از خطاهای پیش بینی را ارائه می‌دهد.

▪️ DA neurons also appear to code other variables that may be important for goal-oriented behavior and decision making, such as signaling the salience of information in the environment.

▪️ همچنین به نظر می‌رسد که نورون‌های DA متغیرهای دیگری را کدگذاری می‌کنند که ممکن است برای رفتار هدف‌گرا و تصمیم‌گیری مهم باشند، مانند سیگنال‌دهی برجستگی اطلاعات در محیط.

کپی بخش یا کل این مطلب «آینده‌‌نگاران مغز» تنها با کسب مجوز مکتوب امکان‌پذیر است.

»» کتاب علوم اعصاب شناختی گازانیگا

»» قسمت اول فصل کنترل شناختی

»» قسمت چهارم: ادامه فصل کنترل شناختی

» کتاب علوم اعصاب شناختی گازانیگا
»» فصل قبل: فصل زبان
»» فصل بعد: شناخت اجتماعی

» تمامی کتاب علوم اعصاب شناختی گازانیگا