System Design and Implementation


The work DEEP ALICE poses the technical challenge of real-time AI image generation based on images produced by the interactors during the Exhibition/Festival. In addition to the images that serve as the initial input, we also have prompts that are generated automatically, in coordination with the images, from the books “Alice’s Adventures in Wonderland” and “Through the Looking-Glass” by Lewis Carroll. This flow of information processing, which passes through the real-time composition of images selected by the interactors and through the automatic generation of prompts, mobilizes an orchestration of several software platforms/programming environments. Acting as the brain that controls the entire course of the interaction, in addition to capturing video images, we have software programmed in MAX/MSP. This software (a collection of modules with multiple functions – see table below) produces and sends the automatically generated prompts, as well as transmitting the captured image to another programming platform called Touch Designer (TD), which in this case serves as an interface to StreamDiffusion (SD), responsible for AI-based image generation. The software programmed in MAX/MSP controls variations in the AI image-generation parameters by sending them to TD. TD, in turn, forwards these parameters to StreamDiffusion. Behind the scenes, the two software environments (MAX/MSP and TD) also exchange the AI-generated images: they are generated in TD and sent to MAX/MSP, where real-time “post-production” is performed (image adjustments, image blending, and text/prompts overlay). In addition to image processing, the MAX/MSP software communicates with two applications programmed in Python which, in turn, communicate with the OpenAI platform. One of them translates the prompts produced by MAX/MSP from English into Portuguese at each prompt generation, and the other performs character inversion on the prompts during the deepest phase of the “dream.” It is worth noting that the programs written in Python were developed with AI assistance (ChatGPT).



MAX/MSP Module “DeepAlice CAPTURA” (captures camera images and sends them to Touch Designer)



MAX/MSP Module “DeepAlice PROMP2”: generates prompts in real time from text fragments and sends them to Touch Designer.



Touch Designer Module: receives video images from MAX/MSP and processes them using the prompts from the “Deep Alice PROMP2” module



MAX/MSP Module “DeepAlice MAESTRO”: performs full control of interaction dynamics and state variations of the work. Sends and receives parameters from other modules via the OSC protocol.



Interaction State Table:

```

Interaction State Table:

States

User action

Software / operation

Result / image

1 – Initial

Cutouts available outside CAM1 field. Green surface of table 1 in CAM1 view.

The MAX/MSP “DeepAlice CAPTURA” module captures CAM1; sends adjusted image to Touch Designer (via SPOUT server); Touch Designer performs green Chroma Key.

Chroma Key result on both screens.

2 – Initial composition

The interactor selects character cutouts and green/red paper cutouts. A composition is assembled in the CAM1 field.

The MAX/MSP “DeepAlice CAPTURA” module captures CAM1; sends adjusted image to Touch Designer (via SPOUT server); Touch Designer performs green Chroma Key.

Chroma Key result on both screens.

3 – Spiral movement (for X seconds)

The initial interactor or another participant positions themselves at table 2 and rotates the spiral.

1 – The MAX/MSP “DeepAlice CAPTURA” module captures CAM2 (table 2) and gradually applies transparency by detecting spiral motion (computer vision algorithm).

2 – The MAX/MSP “DeepAlice PROMPT1” module randomly generates a sequence composed of three characters from both books.

3 – Touch Designer starts generating the AI image using CAM1 input and the prompt generated by MAX/MSP (both prompts and parameters travel between MAX/MSP and Touch Designer via OSC). Touch Designer communicates with StreamDiffusion (a variation of Stable Diffusion), sending received parameters and prompts so SD generates the video stream. After generation, the image is sent to the MAX/MSP “VIDEO MIXER” module for final composition, accompanied by captions (prompts).

4 – The MAX/MSP “DeepAlice PROMPT1” module sends a request to ChatGPT (via Python code communicating with the OpenAI API through OSC) to translate the generated prompt from English to Portuguese.

The image generated by Touch Designer (via StreamDiffusion) is displayed on screen 2 after dissipation of table 1 imagery.

The generated images undergo continuous transformations driven by variations in table 1 video input and ongoing changes in AI synthesis parameters and weights.

Caption: on screen 1 the caption appears mirrored. On screen 2, when the AI image is revealed, the caption also appears.

4 – Spiral movement (for X seconds)

State change: the prompt is now generated by the “DeepAlice PROMPT2” module. Four text segments from both books are randomly selected and regrouped, forming a new sentence.

Image generated from more complex prompts with no clear semantic meaning.

Caption follows same behavior as previous state.

5 – Spiral movement (for X seconds)

— — —

The “DeepAlice PROMPT2” module generates a new nonsensical sentence. The module sends a request to ChatGPT (via Python/OpenAI API through OSC) to invert the characters of the generated sentence, as well as reversing sentence order.

Image generated from increasingly complex and machine-incomprehensible prompts.

Caption follows same behavior as previous state (displaying the inverse of the inverse on Screen 1 and the inverted sentence on Screen 2).

6 – Spiral stops

Interactor stops rotating the spiral.

The “DeepAlice VIDEO MIXER” module gradually dissipates the TD/SD-generated image, revealing again the original image produced on table 1.

Chroma Key result from table 1 is again presented on both screens.



Nenhum comentário:

Postar um comentário