Directed Image Positioning v1版本Image Positioner V1 (ID: 1061880) 综合资源合集综合资源合集

Tell it where you want the images to be placed in your scene with natural language prompting. Experimental.

...

After some experimentation it does seem to enhance the image coherency and positional adherence. Pics below with notes in images.

I have added a public dataset of PNG images with metadata in JSON I created with Python. The images are not captioned but should be better than anything I found online if anyone cares to join me in my experiments.

I used GeoShapes that were hand labeled with just the smaller shapes with their approximate location in various plain language descriptions, no formal way, then let an AI add the descriptive captions after them. If I had a more formal dataset of positional images i could do magic.

From ChatGPT:

That’s a clever approach! Using GeoShapes with plain language labeling for training is an innovative start, as it provides the AI a loose "positional vocabulary" to work with. This flexibility allows it to generalize placement and context, which is why you’re seeing those natural, somewhat structured compositions.

To elevate this even further, a more formal positional dataset would be incredibly powerful. Here are a few ideas for constructing such a dataset to amplify the precision and adaptability of your model:

1. Gridded Coordinate System: Create images with shapes positioned according to a defined grid (e.g., 3x3 or 5x5). Each grid cell would represent a different placement: top-left, center-right, bottom-middle, etc. Labels would include grid-based coordinates (e.g., (1,1) for top-left), allowing for a finer positional mapping. This structured approach could help the AI learn precise placements.

2. Layered Position Descriptions: Instead of just approximate descriptions, use layered terms. For instance, combine location (e.g., top-left, bottom-center), size (small, medium, large), distance from edges, and angles (rotation, tilt). This would give the model richer metadata, allowing for more specific placements, rotations, and even layering of shapes.

3. Structured Language for Composition Rules: By integrating a formalized language structure that includes rules for composition (e.g., “centered vertically, offset slightly left horizontally”), the model could learn to arrange shapes based on relational positions—how one object should sit relative to others or the background.

4. Dynamic Background and Lighting Contexts: Incorporate variations in backgrounds and lighting to teach the model how positional accuracy changes in context. For example, adding shadows based on hypothetical light sources could make positional interpretation even more robust.

5. Use of Bounding Boxes or Anchor Points: Label images with bounding boxes or anchor points for shapes, with coordinates that specify exact placements. These markers would help the AI learn exact placements rather than just "approximate" positioning, useful for precise layouts.

6. Combined Shapes in Hierarchical Order: Introducing overlapping or adjacent shapes with a hierarchy of importance (e.g., primary, secondary) would let the model understand layered compositions. For example, a small red square within a larger green square can be tagged as "centered within green square," helping the AI learn containment and layering nuances.

Once you have this kind of dataset, you could train the AI to handle much more complex layouts, orientations, and compositions. With such a structured dataset, the potential for precise layout and arrangement—whether for graphic design, illustration, or even UI composition—would be almost endless. Let me know if you’d like to discuss any of these ideas further or explore ways to implement them!