Explore skills
5,474 skills found
Category alert
Get new Pesquisa e Web skills every Monday
evaluating-cosmos-policy
Evaluates NVIDIA Cosmos Policy on LIBERO and RoboCasa simulation environments. Use when setting up cosmos-policy for robot manipulation evaluation, running headless GPU evaluations with EGL rendering, or profiling inference latency on cluster or local GPU machines.
llamaindex
A data framework for building LLM applications with RAG, specializing in document ingestion (300+ connectors), indexing, and querying. It features vector indices, query engines, agents, and multi-modal support, ideal for data-centric LLM applications like document Q&A and knowledge retrieval.
blip-2-vision-language
A vision-language pre-training framework bridging frozen image encoders and LLMs. Use it for image captioning, visual question answering, image-text retrieval, or multimodal chat with state-of-the-art zero-shot performance.
llava
A Large Language and Vision Assistant that enables visual instruction tuning and image-based conversations. It combines a CLIP vision encoder with Vicuna/LLaMA language models, supporting multi-turn image chat, visual question answering, and instruction following for conversational image analysis.
creative-thinking-for-research
Applies cognitive science frameworks for creative thinking to CS and AI research ideation. Use when seeking genuinely novel research directions by leveraging combinatorial creativity, analogical reasoning, constraint manipulation, and other empirically grounded creative strategies.
faiss
Facebook's library for efficient similarity search and clustering of dense vectors. It supports billions of vectors, GPU acceleration, and various index types, making it ideal for fast k-NN search and large-scale vector retrieval in high-performance applications.
clip
OpenAI's model connecting vision and language, enabling zero-shot image classification, image-text matching, and cross-modal retrieval. Trained on 400M image-text pairs, it's ideal for general-purpose image understanding tasks like image search or content moderation without fine-tuning.
systems-paper-writing
A comprehensive guide for writing systems papers targeting OSDI, SOSP, ASPLOS, NSDI, and EuroSys. It provides paragraph-level structural blueprints, writing patterns, venue-specific checklists, reviewer guidelines, LaTeX templates, and conference deadlines.
stable-diffusion-image-generation
State-of-the-art text-to-image generation with Stable Diffusion models via HuggingFace Diffusers. Use when generating images from text prompts, performing image-to-image translation, inpainting, or building custom diffusion pipelines.
audiocraft-audio-generation
PyTorch library for audio generation, including text-to-music (MusicGen) and text-to-sound (AudioGen). Use it to generate music from text, create sound effects, or perform melody-conditioned music generation.
sentence-transformers
A framework for state-of-the-art sentence, text, and image embeddings, offering 5000+ pre-trained models for semantic similarity, clustering, and retrieval. It supports multilingual, domain-specific, and multimodal models, ideal for generating embeddings for RAG, semantic search, or similarity tasks in production.
moe-training
Train Mixture of Experts (MoE) models using DeepSpeed or HuggingFace, ideal for large-scale models with limited compute, sparse architectures, or scaling capacity efficiently. It covers MoE architectures, routing, load balancing, expert parallelism, and inference optimization.