Human exclusion from creative tasks worsens quality and worker satisfaction: new research study
What does the future of human-AI collaboration in work look like? We find that human participation in the most creative parts of work is critical for quality, satisfaction, and creative diversity
AI integration at the workplace is poised to redefine the future of work, yet our understanding of how to design effective human-AI partnership remains limited. While AI will ultimately automate significant parts of work, human-in-the-loop (HITL) systems are a key social and economic goal. This is partly because many tasks that require creativity, empathy and judgment remain challenging for AI to perform alone and also because of social alignment goals such as the need to design workplaces that drive productivity while leaving room for human relevance. Ultimately, understanding how to balance human ingenuity with machine efficiency is not just a technological challenge but a profound design question for the future of work.
We recently conducted one of the first studies to examine how human-AI collaboration design affects human productivity, satisfaction, quality of work, and diversity of creative output.
Designing Creative Work
The creative process is commonly divided into two phases: a design/pre-production phase involving ideation and planning and a production phase which involves implementing the plan and review/refinement. We study human-AI collaboration in the context of creative writing. With creative writing in particular, the pre-production/design phase consists of idea generation and producing a story outline whereas the production phase consists of drafting and editing.
Study Design
On Day 1, participants completed an initial survey assessing their English proficiency, creative writing skills and experience. They then wrote a 1,000-word story without any AI assistance following a seven-sequence structure commonly used in fiction writing. On Day 2, participants wrote a new 1,000-word story using ChatGPT 3.5, following their assigned collaboration model. Participants were randomly assigned to one of three AI collaboration models which reflect some of the common ways in which we see AI being integrated into work in practice:
Human Confirmation, where the AI handles all tasks and humans confirm or reject the output
Human Creativity, where humans perform pre-production (i.e., ideation and outlining) without AI assistance, while AI executes production phase (i.e., drafting and editing) with humans confirming or rejecting AI output like in Human Confirmation design.
Copilot, where humans and AI collaborate throughout the entire process from ideation all the way to editing.
Completion Time
We observed no statistically significant variations in total completion time across the different collaboration designs. Total completion time went down by 36.2%, with the most significant gains during the drafting stage (68.6% reduction). So all models have the potential to deliver big productivity gains.
Writing Quality
Without AI assistance (Day 1), overall story quality was not statistically different across groups (see figure below). With AI collaboration (Day 2), the Human Confirmation group, where AI managed creative tasks, produced lower-quality stories compared to the Human Creativity and Copilot groups, where humans performed or collaborated in creative tasks. No significant differences between the latter two groups. In short, human involvement in early creative aspects of writing (ideation and structuring) enhances overall quality when using AI assistance.
In Figure B below, the X-axis represents the quality of stories on Day 1. In other words, it’s a measure of writer skill level. The Y-axis is the story quality on day 2 with AI assistance. AI reduces quality differential between high versus low skilled writers, as seen from Day 2 versus Day 1 plot (slopes << 1). This aligns with other studies that suggest that AI bridges skill-based performance gaps. Human Confirmation design, which minimized human creative input, shows the lowest slope (not significantly different from zero implying that skill-based differences in story quality almost disappeared). Notably, the lower slope with Human Confirmation was driven by higher-skilled writers being less effective in this design (high-skilled writers fared worse with this design than the other two designs) and not because lower-skilled writers benefited more from it. These results suggest that allowing active human involvement in creative tasks provides more opportunity for high-skilled writers to excel.
User Satisfaction
To measure participant satisfaction, we measured overall satisfaction with the collaboration model, whether the design offered them enough flexibility, whether they felt the process was effective for them, and whether they would reuse the process again,. The Human Creativity and Copilot groups reported higher overall satisfaction, flexibility, process effectiveness, and intent to reuse the process compared to the Human Confirmation group. When comparing AI-assisted to human-only writing (i.e. Day 2 vs Day 1), the Human Creativity and Copilot groups experienced increased satisfaction, effectiveness, and intent to reuse the process relative to human-only modes of writing; the Human Confirmation group showed no improvement in satisfaction.
Creative Diversity
Next we measured the diversity of stories written by participants. In the absence of AI support, our participants gravitated primarily towards writing Young Adult stories, which simply reflected their age (students). In contrast, when they had AI support, we saw an increase in the frequency of other genres such as Fantasy, Mystery, and Sci-Fi. In short, AI helped individual writers explore new writing styles and genres which is a big plus.
An analysis of story similarity shows that AI-assisted writing leads to increased semantic similarity among stories within treatment groups. The Human Confirmation group, where AI had the greatest role, showed the highest similarity, followed by Copilot, Human Creativity, and writing without AI (see figure below). Even though we found that individuals are exploring more with AI, different users using AI will explore similarly thereby reducing aggregate creative diversity. In short, excessive use of AI by workers in creative tasks will reduce creative diversity in organizations. But human involvement in creative tasks mitigates this effect, with the Human Creativity model producing the most diverse stories among AI-assisted groups.
Participant Feedback
Participants in the Human Confirmation group expressed feeling detached from their final stories, highlighted struggles with getting AI to produce content fully aligned with their vision, and found AI-generated stories lacking personal touch. For instance, one participant mentioned feeling "super removed from the content of the story," which led to a final product they were "far from proud of."
End Notes
Many recent studies have asked “Does AI help increase human productivity?” I think that answer is now clear and it’s time to switch our focus towards a different question: “How should we design the future of work with humans and AI to improve productivity and work quality while preserving work's meaning and value?” A central insight of our research is that the human creative role remains essential, even as AI capabilities advance. As you think about AI integration, please keep in mind that workflow design will have a big impact. The full paper is here.
Community Notes
Many of you have written to me about Deepseek. There also appears to be a lot of misunderstanding about it. My next post (which I hope will not take more than a few days) will be a Deepseek FAQ and its implications for AI.
Even if the tacit knowledge of individuals can be induced within the training models … the creativity and the ingenuity of the human mind has thus far proven to be a tough challenge to be effectively introduced into any of the training data. Can the models be unleashed a bit more by tweaking the first principles?… yet to be seen! Until then data such as this one would stand to prove the inevitability of keeping humans especially in the fields requiring creativity or skill to think outside the box … in this case the proverbial box could effectively be replaced by data. 👍🏻
Excellent study design and analysis. Most useful.