COMIC CARTOON GENERATION
Since April Fool’s Day is today, let me share some of my results on automated comic cartoon generation.



Last time, I shared how we, with our Pavel Shtykovskiy, wrote a paper on Humor Mechanics, published on ICCC-2024, and how we, with Alexey Ivanov, launched HUMOR-ARENA to collect human labels and improve automated humor generation and ranking. After that, I decided to check if AI can be used to generate and filter proper cartoons (based on my one-liners previously generated with AI).



First, not each one-liner (even a good one) can be used as a base for a cartoon. Some jokes are too abstract, some wordplay can not be adequately visualized. That means we need an automated way to understand if the given joke is a good starting point. I’ve collected some examples of good and bad ones and asked a reasoning model (o1) to generate an instruction, a guide. It gave me specific rules, including checks for Visual Clarity, Concrete Elements, Scene Foundation, and so on. So, I took our top generated jokes (from Humor-Arena rating) and filtered them with claude-3.5-sonnet + o3-mini, both armed with that visual instruction. If any of these models thinks the joke is bad for visualizing, we reject it. That leaves us with 25% of jokes from the top.



Next, we need to generate the cartoons. (Note: this was done before recent releases of new image-gen LLMs, so now it will be even easier)
For a generation, I used a pair of o3-mini + DALLE-3 models; the trick is to provide enough details to develop a recognizable style and make a funny cartoon. Since I aimed to match some specific visual style, resembling classics like New Yorker’s or Floyd Gottfredson’s, I took a bunch of examples and reverse-engineered a generalized visual style description.



As for funny image creation, vanilla o3-mini wasn’t creative enough to come up with interesting details without hints, so, again, I used a superior model (o1) to generate meta-instructions, a guide on how to create an interesting cartoon based on a given one-liner.



That gives me a lot of cartoons, some of them (I’d say 20–30% I found pretty good, personally). Using cursor, I briefly sketched a script to resize the image and add the text of original joke at the bottom, using comic-sans, you know.



Neat!