Published skills
Showing 48 of 208
bio-applied-dimensionality-reduction
scRNA-seq dimensionality reduction and clustering using PCA, k-NN graph, UMAP, and Leiden. Includes a guide for parameter selection, implementation patterns, and common pitfalls.
bio-applied-genetic-engineering-in-silico
In silico restriction digestion, compatible end detection, primer design (Tm models), and gel simulation.
bio-applied-gwas
Genome-Wide Association Studies (GWAS) using NumPy.
bio-applied-isoform-analysis
Isoform analysis with long reads, featuring Minimap2 for splice alignment, bambu for isoform discovery, and DRIMSeq for differential isoform usage.
bio-applied-trajectory-analysis
scRNA-seq trajectory analysis: pseudotime (DPT), PAGA graph abstraction, and RNA velocity (scVelo). Decision guide, key parameters, and pitfalls.
virology-bioinformatics
Viral genome assembly, intra-host variant calling, phylodynamics, and real-time surveillance.
vision-language-models
Vision-language model inference patterns for scientific documents.
rnaseq-analysis
RNA-seq differential expression analysis and normalization workflows.
string-algorithms
Pattern matching algorithms: naive, KMP (failure function), Rabin-Karp (rolling hash), and DFA-based matching for sequence search.
ai-science-genomic-llms
Genomic Foundation Models: Nucleotide Transformers, HyenaDNA, and Evo with NumPy.
algo-avl-trees
A self-balancing BST (Adelson-Velsky & Landis, 1962) guaranteeing O(log n) operations via rotation-based rebalancing.
algo-dijkstra
Dijkstra's Algorithm: Shortest Paths in Weighted Graphs
algo-suffix-trees
Suffix trees are compressed tries of all suffixes, enabling O(m) pattern search and O(n) construction via Ukkonen's algorithm.
bio-applied-vdj-biology
V(D)J Recombination and Adaptive Immune Receptors
bio-applied-differential-binding
Differential binding analysis for ChIP-seq, covering DiffBind workflow, consensus peaks, normalization, and MA/volcano plots. Useful for comparing ChIP-seq signals across different conditions.
ai-science-esm2-embeddings
ESM2 Embeddings and ESMFold with NumPy
ai-science-llm-training-systems
Module T5-01B: LLM Training Systems (Tracking, Epochs, and Ablations) with Pandas
ai-science-zero-shot-mutation
Zero-Shot Mutation Effect Prediction with NumPy
algo-linked-lists
A singly linked list with a full implementation, featuring head/tail pointers, operations for insert, delete, search, and reverse, along with a complexity table.
algo-red-black-trees
Red-black tree: a self-balancing BST with O(log n) operations, maintaining 5 invariants, and fixing insertions with rotations and recoloring.
bio-applied-assembly-binning
Metagenomic assembly with MEGAHIT, contig binning with MetaBAT2, and MAG quality assessment with CheckM. Includes binning signals, multi-sample strategy, and MIMAG quality tiers.
algo-tabulation
Bottom-up Dynamic Programming with tabulation, covering edit distance, LCS, and space optimization using rolling arrays.
bio-applied-virtual-screening
Virtual screening for drug discovery: pharmacophore modeling, docking score filtering, and ADMET prediction. Use when computationally screening compound libraries.
ai-science-diffusion-generative-models
Score matching, noise schedules, DDIM sampling, and DDRM inverse problems for diffusion generative models.
ai-science-splicing-models
Splicing Models: SpliceAI and AlphaGenome with NumPy
algo-knapsack
Knapsack DP variants, including 0/1, unbounded, and subset sum, with traceback and space optimization.
alphafold-structure-prediction
AlphaFold/ESMFold structure prediction and confidence interpretation.
bio-applied-copy-number-analysis
DNA copy number analysis — read depth normalization, CBS segmentation, CN state calling, and genome-wide visualization.
bio-applied-metabolic-flux
Flux balance analysis and metabolic modeling with COBRApy. Use when predicting metabolic fluxes, simulating gene knockouts, or analyzing stoichiometric models.
ai-science-geneformer-scgpt
Geneformer and scGPT for Single-Cell Modeling.
algo-aho-corasick
Multi-pattern string matching in O(n + m + z) via a trie augmented with KMP-style failure links.
algo-hash-tables-bloom
Hash tables (chaining vs open addressing) and Bloom filters: complexity, trade-offs, and implementation patterns.
bio-applied-chipseq-pipeline
ChIP-seq pipeline covering quality control, alignment, deduplication, peak calling using MACS2, and signal normalization with deepTools.
bio-applied-functional-annotation
Functional Annotation of Metagenomes with NumPy
ai-science-enformer-regulatory
Enformer architecture for regulatory prediction from DNA, in-silico mutagenesis (ISM), and variant prioritization.
algo-binary-search-trees
BST operations and complexity, with a clean implementation using parent pointers to support all standard operations.
atac-seq-analysis
ATAC-seq quality control and accessibility analysis.
bio-applied-cancer-transcriptomics
Cancer transcriptomics for melanoma subtype classification (Tirosh/Harbst), employing a preprocessing pipeline, PCA/t-SNE, hierarchical clustering, random forest, and Kaplan-Meier survival analysis.
bio-applied-data-harmonization
Multi-omics data harmonization, encompassing normalization strategies, missing data imputation, batch correction, and integration approaches such as MOFA2 and DIABLO.
bio-applied-epigenetic-clocks
Epigenetic Clocks and Aging Analysis with Matplotlib
algo-rabin-karp
Rabin-Karp hash-based string matching uses a rolling hash for O(n+m) average time complexity and excels at multi-pattern search.
bio-applied-bio-data-formats
Quick reference for bioinformatics file formats — FASTA, FASTQ, SAM/BAM/CRAM, VCF, BED, GFF/GTF, BigWig, PDB, Newick — specs, coordinate systems, and parsing patterns.
bio-applied-clinical-genomics
Clinical genomics, covering ACMG/AMP variant classification, ClinVar queries, and clinical reporting workflows.
bio-applied-dmr-analysis
Differentially Methylated Regions (DMRs)
bio-applied-metabolite-identification
Metabolite identification from MS/MS spectra: spectral matching, molecular formula prediction, and database searching (HMDB, KEGG). Use when annotating unknown metabolites.
bio-applied-mofa2
MOFA2 is an unsupervised multi-omics factor analysis for variance decomposition, factor interpretation, and shared/view-specific signal separation. Use it when integrating multiple omics layers.
bio-applied-molecular-modeling
Molecular Modeling with NumPy
bio-applied-ont-processing
ONT Data Processing with NumPy
Category alert