GEMINI is an open-source bioinformatics tool and website written in python to facilitate near-neighbor searching of genomic data. This website is currently under construction. Use the Submit a Query panel to search one of our datasets, or read through the tutorial to learn more.
GEMINI allows users to more effectively search Level 3 gene expression datasets from the Cancer Genome Atlas Project, by using data itself as a query. Rather than performing a keyword search, GEMINI compares the similarity of your data to existing TCGA samples to determine the most relevant results. To submit a query, upload your data to the Query section in one of the following formats:
GEMINI organizes samples into distinct searchable datasets, such as OV for ovarian cancer samples. Once you have uploaded your query, select a dataset from the dropdown menu and click the search button.
GEMINI returns a list of the top ten nearest neighbors in the dataset based on a similarity function. Currently, GEMINI uses a combination of principal component analysis and Euclidean distance between samples to determine similarity, though more sophisticated metrics will be added in the future. The results page also shows a visual representation of the nearest neighbors' top 10 principal components, and future work will enable users to view and compare associated sample information, such as clinical outcomes.